Duplicate Record Exception Transformation Overview
Duplicate Record Exception Transformation Overview
The Duplicate Record Exception transformation is an active transformation that reads the output of a data quality process and identifies duplicate records that require manual review. The Duplicate Record Exception transformation is a multiple-group transformation.
The Duplicate Record Exception transformation receives input from another transformation or from a data object in another mapping. The input to the Exception transformation must contain a numeric match score that the transformation can use to determine if the record is a duplicate. Set an upper and lower match score threshold in the Duplicate Record Exception transformation.
The Duplicate Record Exception transformation performs one of the following actions:
If the match score is greater than or equal to the upper threshold, the transformation treats the record as a duplicate and writes it to a database target.
If the match score is less than the upper threshold and greater than the lower threshold the transformation treats the record as a possible duplicate and writes it to a the record to a different target for manual review. If the record belongs to a cluster, the transformation writes all records in the cluster to the target.
When a cluster has any match score less than the lower threshold, all records in the cluster go to the unique records output group. Clusters of size 1 are routed to the unique group, regardless of match score. By default, the Exception transformation does not write unique records to a target. You can configure the transformation to return the unique records.
If any match score in a cluster is not in the range 0 - 100, the Exception transformation ignores all rows in the cluster. The Data Integration Service logs a message that includes the clusterID.