Design Guide

10.0
- 10.5 HotFix 2
- 10.5 HotFix 1
- 10.5
- 10.2 HotFix 1
- 10.2
- 10.1
- 10.0 HotFix 1

Back Next

Reducing Database I/O

This section describes about the IDX size and Compressed Key Data.

IDX Size

The physical size of the IDX will determine how efficiently the database cache will operate. Reducing the size of the IDXs will improve performance. This is achieved by selecting the most appropriate Compressed-Key-Data value (as described in the Compressed Key Data section) and using flattening (as described in the Flattening IDTs section) to reduce the number of rows.

Compressed Key Data

The IDX stores fuzzy keys and identity data for matching. The identity data is compressed and stored using an algorithm selected with the Identity-Table-Definition’s

Compress-Method

parameter. All methods will compress the Identity data and store it in the IDX together with its fuzzy key.

If the length of the IDX record exceeds the DBMS’s limit for an indexable column IIR can either,

Method 0

split the IDX record into multiple adjacent segments (which are all shorter than the length limit).

Method 1

truncate the IDX record at the length limit and only store one segment. This forces additional I/O at run time if the IDX record is selected for matching, as the matching data must be read from the IDT record.

IDX segments are fixed in length and have the following layout:

Partitions

Fuzzy SSA-NAME3 Key

Compressed Identity Data

The segment length is the sum of

partition length (optional, user defined)

SSA-NAME3 key length (5 or 8 bytes)

Compress-Key-Data(n)

parameter

4 bytes of additional overhead

The maximum segment length depends on the host DBMS:

Oracle 255 bytes

UDB 250 bytes

Since the segment length is fixed, choosing an appropriate value for

is important because it affects the total amount of space used and the I/O performance of the index. Determining an optimal value for n requires knowledge of the characteristics of the source data and how well it can be compressed.

is set too high, all segments will use more space than necessary. If

is too low, records will be split into multiple segments, incurring extra overhead for the duplication of the Partition and Fuzzy Key in each segment.

Measuring Compression

The IIR Table Loader can be used sample the source data and produce a histogram of the Compressed Identity Data lengths.

A sample table appears below:

KeyLen

is the length of the Identity Data after compression

Count

is the number of records with that length

Percent

is the cumulative percentage of the number of records having lengths less than or equal to the current

KeyLen

Segment Lengths

The histogram can be converted into more useful data by running

%SSABIN%\histkg ReportFile

where

ReportFile

is the name of the log file containing the IIR Table Loader output. It will produce a report similar to the one below:

The first section of the report summarizes the histogram.

Value	Description
IndexName	the name of the IDX
KeyData	the sum of the length of the IDT columns (the Identity Data, uncompressed)
CompLen	Compress-Key-Data(n) value
Key Len	the length of the Identity Data after compression
Count	the number of records with this KeyLen
Bytes	KeyLen * Count
Comp-1	total bytes to store Count records with KeyLen using Method 1 (1 segment only)
Comp-2	total bytes to store Count records with KeyLen using Method 0 (multiple segments)
Segs	the number of segments required to store Count records of KeyLen

The second part of the report gives space estimates for various values of

Compress-Key-Data(n)

Value	Description
KeyDataOffset	length of the Fuzzy Key including any partition
KeyOverhead	the overhead associated with storing the segment on the host DBMS (assumed)
Blocksize	DBMS block size (assumed)
BlockOverhead	DBMS overhead when storing records within a block including control structures and padding (assumed)
compLen	n from Compress-Key-Data(n)
Bytes	the number of bytes required to store segments of this size
Segs	the number of segments used
Segs/Key	the average number of segments per IDX record.
DB-Bytes	the number of bytes for segment of this size (scaled up by KeyOverhead )
DB-Blocks	the number of blocks for segments of this size (based on the Blocksize and BlockOverhead )