Table of Contents

Search

  1. Preface
  2. Introduction
  3. Installation
  4. Design
  5. Operation

Flattening IDTs

Flattening IDTs

Flattening is the process of packing denormalized data created by joining tables in a "one to many" (1:M) relationship, into repeating groups in the IDT. It can significantly increase search performance.

The problem

A typical database design consists of multiple tables that have been normalized to some degree. While full normalization is logically elegant and efficient to update, it performs poorly for read access when the tables need to be joined to satisfy a query.
DCE provides very fast search performance by storing all search, matching and display data together in the same table, thereby avoiding the need for costly joins at search time. The DCE Loader denormalized the data while creating the IDT.
A disadvantage of denormalization is the explosion in the number of rows that occurs when joining tables that have a 1:M relationship, as the SQL engine produces one row for every possible combination.
As a result, storage and retrieval costs increase.
To overcome this problem, DCE can collapse (flatten) related denormalized rows into one IDT row that contains repeating groups. This means:
  • significantly faster search performance
  • less database storage for the IDT and IDXs
The most significant performance benefits occur when the Key-Field for the IDX is sourced from the 1 table in the 1:M relationship. This is due to the fact that the use of repeating groups for the M data reduces the number of rows containing the same Key-Field value, which in turn produces less keys to index (at load time) and less candidates to retrieve and match at search time.

Logical Design

Consider the following logical design where the tables have been fully normalized. Each employee of a contracting business can work for many clients. Each client’s business premises may have many phone lines.
Suppose there is only one employee currently:
Emp_Id
Name
E001
Joe Programmer
Joe contracts his services to three companies located around the state:
Company_Id
Address
FK_Emp_Id
C001
3 Lonsdale
St E001
C002
19 Torrens
St E001
C003
4 Rudd
St E001
Each company has the following phone numbers:
Company_Id
Phone
C001
62575400
C001
62575401
C002
98940000
C003
52985500
C003
52985501
C003
52985502

Denormalized Data

A simple SQL query to denormalize this information will generate six rows of data because Joe Programmer works at three offices, each having multiple phone numbers.
The denormalized data from a join operation.
Emp_Id
Name
Company_Id
Address
Phone
E001
Joe Programmer
C001
3 Lonsdale St
62575400
E001
Joe Programmer
C001
3 Lonsdale St
62575401
E001
Joe Programmer
C002
19 Torrens St
98940000
E001
Joe Programmer
C003
4 Rudd St
52985500
E001
Joe Programmer
C003
4 Rudd St
52985501
E001
Joe Programmer
C003
4 Rudd St
52985502
If the search application needs to search on Name, DCE will create fuzzy keys on the Name column. As there are many rows with the same value, duplicate keys will be created and maintained. At search time, multiple rows will be retrieved and matched.

Flattened Data

To reduce the number of rows, DCE can flatten the six rows into one. This is achieved by declaring some columns in the IDT as repeating fields . If data sourced from the M tables (
Company_Id
,
Address
and
Phone
) were defined to repeat up to six times, all of the data could be flattened into one row.
The table below has been turned on its side due to space limitations. It represents a single row of data with column names on the left and data values on the right. Note how the data from the "1" Table no longer repeats.
A single flattened row
Column
Value
Emp_Id
E001
Name
Joe Programmer
Company_Id [1]
C001
Company_Id [2]
C002
Company_Id [3]
C003
Address [1]
3 Lonsdale St
Address [2]
3 Lonsdale St
Address [3]
19 Torrens St
Address [4]
4 Rudd St
Address [5]
4 Rudd St
Address [6]
4 Rudd St
Phone [1]
62575400
Phone [2]
62575401
Phone [3]
98940000
Phone [4]
52985500
Phone [5]
52985501
Phone [6]
52985502
An IDT with this structure would have only one row and the IDX would contain six times fewer
Name
keys. The number of candidates selected during a search on the
Name
IDX would also decrease by a factor of six.
This structure does not improve the performance of an IDX created on the
Address
fields, as they contain duplicate values. However, Flattening-Options (discussed later) can be defined to remove duplicate entries from the repeating fields in order to provide some benefit as well.

0 COMMENTS

We’d like to hear from you!