Table of Contents

Search

  1. Preface
  2. Introduction to Transformations
  3. Transformation Ports
  4. Transformation Caches
  5. Address Validator Transformation
  6. Aggregator Transformation
  7. Association Transformation
  8. Bad Record Exception Transformation
  9. Case Converter Transformation
  10. Classifier Transformation
  11. Comparison Transformation
  12. Consolidation Transformation
  13. Data Masking Transformation
  14. Data Processor Transformation
  15. Decision Transformation
  16. Duplicate Record Exception Transformation
  17. Expression Transformation
  18. Filter Transformation
  19. Hierarchical to Relational Transformation
  20. Java Transformation
  21. Java Transformation API Reference
  22. Java Expressions
  23. Joiner Transformation
  24. Key Generator Transformation
  25. Labeler Transformation
  26. Lookup Transformation
  27. Lookup Caches
  28. Dynamic Lookup Cache
  29. Macro Transformation
  30. Match Transformation
  31. Match Transformations in Field Analysis
  32. Match Transformations in Identity Analysis
  33. Normalizer Transformation
  34. Merge Transformation
  35. Parser Transformation
  36. Python Transformation
  37. Rank Transformation
  38. Read Transformation
  39. Relational to Hierarchical Transformation
  40. REST Web Service Consumer Transformation
  41. Router Transformation
  42. Sequence Generator Transformation
  43. Sorter Transformation
  44. SQL Transformation
  45. Standardizer Transformation
  46. Union Transformation
  47. Update Strategy Transformation
  48. Web Service Consumer Transformation
  49. Parsing Web Service SOAP Messages
  50. Generating Web Service SOAP Messages
  51. Weighted Average Transformation
  52. Window Transformation
  53. Write Transformation
  54. Appendix A: Transformation Delimiters

Developer Transformation Guide

Developer Transformation Guide

Driver Scores and Link Scores in Cluster Analysis

Driver Scores and Link Scores in Cluster Analysis

When you select a cluster output option in the Match transformation, you can add link score and driver score data to the output.
The link score is the score between two records that identifies the records as members of the same cluster. The links between records determine the composition of the cluster. Any record can link to any other record in the same cluster.
The driver score is the score between the record with the highest sequence ID value in a cluster and another record in the same cluster. Driver scores provide a means to assess all records in a cluster against a single record. When you add driver scores to the match output, the mapping runs more slowly, as the Match transformation cannot calculate the driver scores until all the clusters are complete.
Match analysis generates a single set of scores for each strategy that you define. The driver score and the link score indicate the match scores for different pairs of records in each cluster. The driver scores and link scores can depend on the order in which the records enter the transformation. The driver score might be lower than the match threshold.

Cluster Analysis Example

You configure a field match strategy to analyze a column of surname data. You set a match threshold of
0.825
in the strategy. You select a clustered output format, and you run the Data Viewer on the transformation.
The following table shows the data that the Data Viewer displays:
Surname
Sequence ID
Cluster ID
Cluster Size
Driver ID
Driver Score
Link ID
Link Score
SMITH
1
1
2
1 - 6
1
1 - 1
1
SMYTH
2
2
2
1 - 3
0.83333
1 - 2
1
SMYTHE
3
2
2
1 - 3
1
1 - 2
0.83333
SMITT
4
3
1
1 - 4
1
1 - 4
1
SMITS
5
4
1
1 - 5
1
1 - 5
1
SMITH
6
1
2
1 - 6
1
1 - 1
1
The Data Viewer contains the following information about the surname data:
  • SMITT and SMITS do not match any record with a score that meets the match threshold. The Match transformation determines that the records are unique in the data set.
    SMITT and SMITS have a cluster size of 1. To find unique records in cluster output, search for clusters that contain a single record.
  • SMITH and SMITH have a link score of 1. The Match transformation determines that the records are identical. The transformation adds the records to a single cluster.
  • SMYTH and SMYTHE have a link score of 0.83333. The score exceeds the match threshold. Therefore, the transformation adds the records to a single cluster.

0 COMMENTS

We’d like to hear from you!