Installation and Configuration Guide

10.1 HotFix 1
- 10.1

Back Next

Configuring Metadata

You must configure the metadata information, such as the name of the column that you want to set as primary key, in the configuration file.

To configure the metadata information, add the following parameters to the

MetaData

section in the configuration file:

PK: Name of the column that you want to set as primary key.
SOURCE_COLUMN_NAME: Name of the column to store the source information of the data.
You can use only
LMT_SOURCE_NAME
as the column name.; You can also use the
part_of_layout
attribute to specify whether the source information is part of the input data. Set to YES if the source information is part of the input data, and set to NO if the source information is not part of the input data. If you set to NO, ensure that you specify the
SOURCE_NAME
parameter.; For example:
<SOURCE_COLUMN_NAME part_of_layout="YES">LMT_SOURCE_NAME</SOURCE_COLUMN_NAME>
SOURCE_NAME: Name of the source for the input data. If the input data does not contain the source name, use the
SOURCE_NAME
parameter to specify the source name. The source name cannot exceed 32 bytes.; For example:
<SOURCE_NAME is_reference="YES">PRIZM</SOURCE_NAME>
CLUSTER_COLUMN_NAME: Name of the column on which you want to store the link identifiers.
CLUSTER_COLUMN_SIZE: Size of the column that stores the link identifiers. Use 40 bytes as the column size.
CLUSTER_OPTION: Indicates whether you want to delete the tables that the load job created in the repository when you run the load job again.; Set to true if you want to delete the tables, and set to false if you want to append the data to the existing tables. Default is false.
MATCHSOURCES: Specifies the source of the data that you want the MapReduce jobs to use.; For example:
<MATCHSOURCES> <MATCHSOURCE>ORACLE</MATCHSOURCE> <MATCHSOURCE>MYSQL</MATCHSOURCE> <MATCHSOURCE>SAP</MATCHSOURCE> </MATCHSOURCES>
The previous example specifies to include data only from Oracle, MySQL, and SAP for the MapReduce jobs to process.
ALTERNATETABLEFORGROUPINFO: Optional. Indicates whether you want to have a separate table to store the link information.; Set to true if you want to have a separate table, and set to false if you do not want to have a separate table. If the value is false, ensure that you specify the
ADDGROUPNUMBERTOROWKEY
parameter. Default is false.
ADDGROUPNUMBERTOROWKEY: Indicates whether you want to add the link number to the record key.; If you set
ALTERNATETABLEFORGROUPINFO=false
, set
AddGroupNumberToRowKey=true
. Default is false.
DELETEBATCHSIZE: Optional. Indicates the total number of records that you can delete at once. Default is 1000.
PARTITION_COLUMN_NAME: Optional. Indicates the name of the column based on which you can create a partition identifier and add the partition identifier to the key. Use the following attributes to define the partition identifier:
length. Defines the length of the column that you can add as the prefix to the keys. The maximum length of the column that you can use is 8 bytes. Default is 2 bytes.
part_of_rowkey. Indicates whether the key includes the partition identifier. Set to true if you want to prefix the key with the partition identifier, and set to false if you do not want to prefix the key with the partition identifier.; For example:
<PARTITION_COLUMN_NAME length="2" part_of_rowkey="YES">STATE</PARTITION_COLUMN_NAME>; If you configure the
PARTITION_COLUMN_NAME
parameter for the initial linking job, you must configure the
PARTITION_COLUMN_NAME
parameter when you run other jobs to update or increment the initial data.
LinkTableName: Base name for the tables that the initial loading job creates in the repository. The initial loading job uses the following format for the table names:
MDMBDRM<OrganizationID>_<LINKTABLENAME>_<PK|GROUP>; For example, if you specify
LINKTABLENAME=LMT_MATCHED
, the initial loading job creates an index table named
MDMBDRM<OrganizationID>_LMT_MATCHED
, a primary key table named
MDMBDRM<OrganizationID>_LMT_MATCHED_PK
, and a link table named
MDMBDRM<OrganizationID>_LMT_MATCHED_GROUP
.
StoreAllFields: Optional. Indicates whether to persist all the columns that you define in the
PZMAP
section in the repository.; Set to true if you want to persist all the columns in the repository. Set to false if you want to persist only the columns that you use to index data in the repository. Default is false.
If you plan to run the initial linking, initial loading, incremental linking, update linking, or repository data deletion job with the matching rules file, you must set
StoreAllFields=true
.
ColumnFamilyName: Name of the column family that groups all the columns in the repository table.
MaxConcurrentSessions: Maximum number of REST requests that you can run concurrently. A higher number improves search performance but uses more memory. You can configure the value based on the amount of available memory. Default is 200.

The following sample code shows the metadata configuration:

<MetaData>
   <PK>ROWID</PK>
   <SOURCE_NAME is_reference="YES">PRIZM</SOURCE_NAME>
   <SOURCE_COLUMN_NAME part_of_layout="YES">LMT_SOURCE_NAME</SOURCE_COLUMN_NAME>
   <CLUSTER_COLUMN_NAME>GROUPNO</CLUSTER_COLUMN_NAME>
   <CLUSTER_COLUMN_SIZE>40</CLUSTER_COLUMN_SIZE>
   <CLUSTER_OPTION deleteLinkTable="TRUE" />
   <MATCHSOURCES>
      <MATCHSOURCE>ORACLE</MATCHSOURCE>
      <MATCHSOURCE>MYSQL</MATCHSOURCE>
      <MATCHSOURCE>SAP</MATCHSOURCE>      
   </MATCHSOURCES>
   <ALTERNATETABLEFORGROUPINFO>false</ALTERNATETABLEFORGROUPINFO>
   <DELETEBATCHSIZE>1000</DELETEBATCHSIZE>
   <PARTITION_COLUMN_NAME length="2" part_of_rowkey="YES">STATE</PARTITION_COLUMN_NAME>
   <LinkTableName>MDM_INDIVIDUAL_LMT_GA</LinkTableName>
   <ColumnFamilyName>MDMBDE_link_columns</ColumnFamilyName>
   <StoreAllFields>true</StoreAllFields>                        
   <AddGroupNumberToRowKey>true</AddGroupNumberToRowKey>                        
   <MaxConcurrentSessions>100</MaxConcurrentSessions>
</MetaData>

Rename Saved Search

Table of Contents

Installation and Configuration Guide

Installation and Configuration Guide

Configuring Metadata

Configuring Metadata