Table of Contents

Search

  1. Preface
  2. Introduction
  3. Defining a System
  4. Flattening IDTs
  5. Link Tables
  6. Loading a System
  7. Persistent-ID (Dynamic Clustering)
  8. Cluster Governance
  9. Static Clustering
  10. Simple Search
  11. Search Performance
  12. Miscellaneous Issues
  13. Limitations
  14. Error Messages

Merge Definition

Merge Definition

This section provides information on the Merge Definition rule and fields that can be used with this operation.

Merge Definition rule

The merge engine creates a preferred (aka “master”, aka “golden”) record based on a cluster of records.
To create a preferred record, use the following steps:
  1. Identify a default record
  2. Construct the output record

Step 1: Identify a default record

A default record is created from the merge operation.
The following rules are applied to find a default record:
  • The candidate list starts with all records in the cluster.
  • Each master selection rule is applied in turn, to eliminate candidates.
  • If a single candidate record is left after all the rules are processed, then this record is considered the default record.
  • If all records are removed by a rule, then that rule has no effect.
  • If multiple candidates remain after all the rules are processed, the first candidate that remains is chosen as the default record.

Step 2: Construct the output record

An output record is a record that is an outcome of Step 1 and Step 2.
After a default record is found as per Step 1, an output record will be generated. In this process, the merge engine will go through each column and process rules applied to each column of the output record. If no rules are specified, then this defaults to taking the value of that column in the default record.
Values for each column in the output record are chosen by applying rules to the candidate set. Some rules select a single pre-existing value (such as most-data), while others aggregate information (such as sum).
The following rules are applied when constructing the output record:
  • If a rule returns a single candidate, processing stops and the value is used for the output.
  • If a rule returns multiple candidates, processing continues to the next rule (for example, multiple records match for most-data ).
  • If multiple candidates remain at the end of processing all the rules, the first candidate in the list is chosen for the output.
  • If no rules are specified, the column is taken from the master record.

Merge Definition

The Merge Definition begins with the
MERGE-DEFINITION
keyword. The fields are as follows:
Field
Description
NAME=
A character string that defines the name of the Merge Definition. It is a mandatory parameter. A Merge name is limited to a maximum of 31 bytes.
COMMENT=
This is a text field that is used to describe the Merge Definition’s purpose.
OPTIONS=
This parameter is a comma-separated list of keywords that define various options for the behaviour of this merge definition.
Member-is-Preferred
changes the behavior of preferred record generation so that preferred records are not stored as new records in the IDT. Instead this option will flag an existing cluster member record as the preferred record for the related cluster. This will reduce the size of the IDT and related IDX’s. Only master rules will determine the preferred record. Column Selection from rules and overrides in the GUI are invalid with this option.
Member-Quick-Select
can be used in conjunction with the
Member-is-Preferred
option to select the first member record in the cluster as the preferred record. This option does not call the rule engine at any time. This options provides the least amount of flexibility with preferred record generation/ selection but has performance advantages.
Audit-Preferred
enables auditing of the preferred records.
MASTER-SELECTION=
A list of the rules used to select the master record. Each rule must have a corresponding
MERGE-MASTER-DEFINITION
with a matching name.
They are processed in the order listed, removing candidates at each step unless all candidates are removed.
Processing stops when a unique record is found, or if the end of the rules is reached. If the end of the rules is reached, and no unique record is found, the first of the remaining candidate records is selected.
COLUMN-SELECTION=
A list of the columns whose default behavior is to be overridden. Each column must have a corresponding
MERGE-COLUMN-DEFINITION
with a matching name. This name is not the name of the column, but the name of the list of rules.
If a column exists in the IDT, but is not specified here, it will be treated as if it had a format equivalent to what is in the IDT (Character, Wide, Numeric), with a single rule of type from-master.

0 COMMENTS

We’d like to hear from you!