Table of Contents

Search

  1. Preface
  2. Introduction to Transformations
  3. Transformation Ports
  4. Transformation Caches
  5. Address Validator Transformation
  6. Aggregator Transformation
  7. Association Transformation
  8. Bad Record Exception Transformation
  9. Case Converter Transformation
  10. Classifier Transformation
  11. Comparison Transformation
  12. Consolidation Transformation
  13. Data Masking Transformation
  14. Data Processor Transformation
  15. Decision Transformation
  16. Duplicate Record Exception Transformation
  17. Expression Transformation
  18. Filter Transformation
  19. Hierarchical to Relational Transformation
  20. Java Transformation
  21. Java Transformation API Reference
  22. Java Expressions
  23. Joiner Transformation
  24. Key Generator Transformation
  25. Labeler Transformation
  26. Lookup Transformation
  27. Lookup Caches
  28. Dynamic Lookup Cache
  29. Match Transformation
  30. Match Transformations in Field Analysis
  31. Match Transformations in Identity Analysis
  32. Normalizer Transformation
  33. Merge Transformation
  34. Parser Transformation
  35. Python Transformation
  36. Rank Transformation
  37. Read Transformation
  38. Relational to Hierarchical Transformation
  39. REST Web Service Consumer Transformation
  40. Router Transformation
  41. Sequence Generator Transformation
  42. Sorter Transformation
  43. SQL Transformation
  44. Standardizer Transformation
  45. Union Transformation
  46. Update Strategy Transformation
  47. Web Service Consumer Transformation
  48. Parsing Web Service SOAP Messages
  49. Generating Web Service SOAP Messages
  50. Weighted Average Transformation
  51. Window Transformation
  52. Write Transformation
  53. Appendix A: Transformation Delimiters

Developer Transformation Guide

Developer Transformation Guide

Duplicate Record Exceptions

Duplicate Record Exceptions

You can use a Duplicate Record Exception transformation to identify clusters of duplicate data that needs manual review. The match scores of records in clusters determines the potential duplicates. You can configure upper and lower thresholds for match scores in the transformation. The upper and lower thresholds define the degree of similarity.
A cluster contains related records that a matching operation groups together. The Match transformation creates clusters using the duplicate analysis operation and the identity resolution operation. Each record in a cluster has the same cluster ID. When the lowest match score in a cluster is between the upper and lower thresholds, the Duplicate Record Exception transformation identifies the cluster as a duplicate record exception cluster. The Match transformation adds a cluster ID value column to all the records. Duplicate records receive the same cluster ID.
The lowest record score in a cluster determines the cluster type. A cluster might have 11 records that have a match score of 0.95 and one record with match score of 0.79. If the upper threshold is 0.9 and the lower threshold is 0.8, the Exception transformation writes the records to the unique records table.

0 COMMENTS

We’d like to hear from you!