Table of Contents

Search

  1. Preface
  2. Introduction to Transformations
  3. Transformation Ports
  4. Transformation Caches
  5. Address Validator Transformation
  6. Aggregator Transformation
  7. Association Transformation
  8. Bad Record Exception Transformation
  9. Case Converter Transformation
  10. Classifier Transformation
  11. Comparison Transformation
  12. Consolidation Transformation
  13. Data Masking Transformation
  14. Data Processor Transformation
  15. Decision Transformation
  16. Duplicate Record Exception Transformation
  17. Expression Transformation
  18. Filter Transformation
  19. Hierarchical to Relational Transformation
  20. Java Transformation
  21. Java Transformation API Reference
  22. Java Expressions
  23. Joiner Transformation
  24. Key Generator Transformation
  25. Labeler Transformation
  26. Lookup Transformation
  27. Lookup Caches
  28. Dynamic Lookup Cache
  29. Match Transformation
  30. Match Transformations in Field Analysis
  31. Match Transformations in Identity Analysis
  32. Normalizer Transformation
  33. Merge Transformation
  34. Parser Transformation
  35. Python Transformation
  36. Rank Transformation
  37. Read Transformation
  38. Relational to Hierarchical Transformation
  39. REST Web Service Consumer Transformation
  40. Router Transformation
  41. Sequence Generator Transformation
  42. Sorter Transformation
  43. SQL Transformation
  44. Standardizer Transformation
  45. Union Transformation
  46. Update Strategy Transformation
  47. Web Service Consumer Transformation
  48. Parsing Web Service SOAP Messages
  49. Generating Web Service SOAP Messages
  50. Weighted Average Transformation
  51. Window Transformation
  52. Write Transformation
  53. Appendix A: Transformation Delimiters

Developer Transformation Guide

Developer Transformation Guide

Cluster Output Options

Cluster Output Options

Select a cluster output option when you want to organize similar or identical records in the output data.
When you select a cluster output option, the transformation adds a cluster ID value to each output record. You can sort the records by the cluster ID values. The transformation output includes a row for every record. If a record does not match another record with a score that meets the match threshold, the transformation assigns a unique cluster ID to the record. Use the
Match Output
view to select or update the cluster output options.
You can select the following cluster output options:
Clusters
Select the option to assign cluster ID values to the output records.
Clusters - Best Match
Select the option to add the record pair with the highest match score to a cluster. Because a record might represent the best match with more than one other record, more than one record pair can share a cluster ID value.
Clusters - Match All
The
Clusters - Match All
option works in the same way as the
Clusters
option.
The transformation uses
Clusters - Match All
and
Clusters - Best Match
as option names in identity match analysis.
If a Data Integration Service runs multiple Match transformations concurrently, the Data Integration Service generates unique cluster ID values for the output from each transformation. Therefore, the cluster ID values for the records that each transformation generates can be non-consecutive.

The Clusters Option and the Clusters - Match All Option

Select the Clusters option in field match analysis. Select the Clusters - Match All option in identity match analysis.
The Match transformation uses the following rules to create the clusters:
  • When two records have a match score that meets the match threshold, the Match transformation adds the records to a cluster.
  • When a record in the data set matches any record in the cluster, the transformation adds the record to the cluster.
  • If a record in one cluster matches a record in another cluster, the process merges the clusters.
  • The transformation performs a continual sweep of the match results until all the records belong to a cluster.
  • If a record does not match any other record in the data set, the transformation assigns a unique cluster ID value to the record.

The Clusters - Best Match Option

Select the Clusters - Best Match option in identity match analysis.
The transformation uses the following rules to create the clusters:
  • The transformation identifies the record that has the highest match score with the current record. If the match score meets the threshold, the transformation adds the pair of records to a cluster.
  • If one of the matching records is in a cluster, the transformation adds the other record to the current cluster.
  • The transformation performs a continual sweep of the match score results until all the records belong to a cluster.
  • A cluster can contain a single record if the record does not match any other record in the data.
You can use the
Match
property on the
Match Output
view to specify how the transformation compares a single data source to a persistent data store. The
Match
property determines whether the transformation looks for duplicates within the source data or the persistent data store.

0 COMMENTS

We’d like to hear from you!