Configuring the Indexes
You can define one or more indexes in the matching rules file. You can define an index within the
MDMBDRMMatchRuleSet section. You must create multiple
MDMBDRMMatchRuleSet sections to define multiple indexes.
To define an index, add the following parameters to the
section within the
Name of the columns based on which you want to index the records. If you specify multiple columns, use commas to separate them.
Ensure that you specify the column names in the
PZMAP section of the configuration file.
- Type of index that you want to create. Use one of the following values:
- FUZZY. A heavy index that contains fuzzy keys.
- USER. A lightweight index that contains exact values from the field.
- Name of the SSA-NAME3 field based on which you want to build keys.
- Optional. Type of key level to build. Use one of the following values:
- Standard. Builds more variations than limited key level but uses less disk space than extended key level.
- Extended. Builds more variations than standard key level and uses more disk space than standard and limited key levels.
- Limited. Builds less variations and uses low disk space than standard and extended key levels.
- Default is Standard.
- Optional. Additional attributes to configure. You can specify the following attributes:
- NAMEFORMAT=L|R. Indicates whether the major word in a name or address is on the left end or the right end. For example, in Western names, the family name is on the right end of the names.
- UNICODE_ENCODING. Specifies the Unicode format of the data that you use.
Optional. Indicates the name of the column based on which you can create a partition identifier and add the partition identifier to the key. Use the following attributes to define the partition identifier:
- length. Defines the length of the column that you can add as the prefix to the keys. The maximum length of the column that you can use is 8 bytes. Default is 2 bytes.
- part_of_rowkey. Indicates whether the key includes the partition identifier. Set to true if you want to prefix the key with the partition identifier, and set to false if you do not want to prefix the key with the partition identifier.
If you configure the
PARTITION_COLUMN_NAME parameter for the initial linking job or initial clustering job, you must configure the
PARTITION_COLUMN_NAME parameter when you run other jobs to update or increment the initial data.
The following sample shows an index definition for the PersonFullName column:
<PARTITION_COLUMN_NAME length="4" part_of_rowkey="Yes">ColumnName1</PARTITION_COLUMN_NAME>