Table of Contents

Search

  1. preface
  2. Introduction
  3. Defining a System
  4. Flattening IDTs
  5. Link Tables
  6. Loading a System
  7. Static Clustering
  8. Simple Search
  9. Search Performance
  10. Miscellaneous Issues
  11. Limitations
  12. Error Messages

Reducing Database I/O

Reducing Database I/O

This section describes about the IDX size and Compressed Key Data.

IDX Size

The physical size of the IDX will determine how efficiently the database cache will operate. Reducing the size of the IDXs will improve performance. This is achieved by selecting the most appropriate Compressed-Key-Data value (as described in the Compressed Key Data section) and using flattening (as described in the Flattening IDTs section) to reduce the number of rows.

Compressed Key Data

The IDX stores fuzzy keys and identity data for matching. The identity data is compressed and stored using an algorithm selected with the Identity-Table-Definition’s
Compress-Method
parameter. All methods will compress the Identity data and store it in the IDX together with its fuzzy key.
If the length of the IDX record exceeds the DBMS’s limit for an indexable column IIR can either,
  • Method 0
    split the IDX record into multiple adjacent segments (which are all shorter than the length limit).
  • Method 1
    truncate the IDX record at the length limit and only store one segment. This forces additional I/O at run time if the IDX record is selected for matching, as the matching data must be read from the IDT record.
IDX segments are fixed in length and have the following layout:
Partitions
Fuzzy SSA-NAME3 Key
Compressed Identity Data
The segment length is the sum of
  • partition length (optional, user defined)
  • SSA-NAME3 key length (5 or 8 bytes)
  • Compress-Key-Data(n)
    parameter
  • 4 bytes of additional overhead
The maximum segment length depends on the host DBMS:
  • Oracle 255 bytes
  • UDB 250 bytes
Since the segment length is fixed, choosing an appropriate value for
n
is important because it affects the total amount of space used and the I/O performance of the index. Determining an optimal value for n requires knowledge of the characteristics of the source data and how well it can be compressed.
If
n
is set too high, all segments will use more space than necessary. If
n
is too low, records will be split into multiple segments, incurring extra overhead for the duplication of the Partition and Fuzzy Key in each segment.

Measuring Compression

The IIR Table Loader can be used sample the source data and produce a histogram of the Compressed Identity Data lengths.
A sample table appears below:
KeyLen
is the length of the Identity Data after compression
Count
is the number of records with that length
Percent
is the cumulative percentage of the number of records having lengths less than or equal to the current
KeyLen

Segment Lengths

The histogram can be converted into more useful data by running
%SSABIN%\histkg ReportFile
where
ReportFile
is the name of the log file containing the IIR Table Loader output. It will produce a report similar to the one below:
The first section of the report summarizes the histogram.
Value
Description
IndexName
the name of the IDX
KeyData
the sum of the length of the IDT columns (the Identity Data, uncompressed)
CompLen
Compress-Key-Data(n)
value
Key Len
the length of the Identity Data after compression
Count
the number of records with this
KeyLen
Bytes
KeyLen * Count
Comp-1
total bytes to store
Count
records with
KeyLen
using Method 1 (1 segment only)
Comp-2
total bytes to store
Count
records with
KeyLen
using Method 0 (multiple segments)
Segs
the number of segments required to store
Count
records of
KeyLen
The second part of the report gives space estimates for various values of
Compress-Key-Data(n)
Value
Description
KeyDataOffset
length of the Fuzzy Key including any partition
KeyOverhead
the overhead associated with storing the segment on the host DBMS (assumed)
Blocksize
DBMS block size (assumed)
BlockOverhead
DBMS overhead when storing records within a block including control structures and padding (assumed)
compLen
n
from
Compress-Key-Data(n)
Bytes
the number of bytes required to store segments of this size
Segs
the number of segments used
Segs/Key
the average number of segments per IDX record.
DB-Bytes
the number of bytes for segment of this size (scaled up by
KeyOverhead
)
DB-Blocks
the number of blocks for segments of this size (based on the
Blocksize
and
BlockOverhead
)
To optimize performance, select the largest
compLen
value that minimizes
DB-blocks
and set
Compress-Key-Data
to this value (35 in the example above).
You may need to customize the
histkg
script
(%SSABIN%\histkg.awk)
if the block size and block overhead values are not correct for your DBMS.

0 COMMENTS

We’d like to hear from you!