Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Ask INFA.

Table of Contents

Search

  1. Preface
  2. Working with Transformations
  3. Address Validator Transformation
  4. Aggregator Transformation
  5. Association Transformation
  6. Bad Record Exception Transformation
  7. Case Converter Transformation
  8. Classifier Transformation
  9. Cleanse transformation
  10. Comparison Transformation
  11. Custom Transformation
  12. Custom Transformation Functions
  13. Consolidation Transformation
  14. Data Masking Transformation
  15. Data Masking Examples
  16. Decision Transformation
  17. Duplicate Record Exception Transformation
  18. Dynamic Lookup Cache
  19. Expression Transformation
  20. External Procedure Transformation
  21. Filter Transformation
  22. HTTP Transformation
  23. Identity Resolution Transformation
  24. Java Transformation
  25. Java Transformation API Reference
  26. Java Expressions
  27. Java Transformation Example
  28. Joiner Transformation
  29. Key Generator Transformation
  30. Labeler Transformation
  31. Lookup Transformation
  32. Lookup Caches
  33. Match Transformation
  34. Match Transformations in Field Analysis
  35. Match Transformations in Identity Analysis
  36. Merge Transformation
  37. Normalizer Transformation
  38. Parser Transformation
  39. Rank Transformation
  40. Router Transformation
  41. Sequence Generator Transformation
  42. Sorter Transformation
  43. Source Qualifier Transformation
  44. SQL Transformation
  45. Using the SQL Transformation in a Mapping
  46. Stored Procedure Transformation
  47. Standardizer Transformation
  48. Transaction Control Transformation
  49. Union Transformation
  50. Unstructured Data Transformation
  51. Update Strategy Transformation
  52. Weighted Average Transformation
  53. XML Transformations

Transformation Guide

Transformation Guide

Groups in Match Analysis

Groups in Match Analysis

A match analysis mapping can take a long time to run because of the number of data comparisons that the transformation must perform. The number of comparisons relates to the number of data values on the ports that you select.
The following table shows the number of calculations that a mapping performs for different numbers of data values on a single port:
Number of data values
Number of comparisons
10,000
50 million
100,000
5,000 million
1 million
500,000 million
To reduce the time that the mapping takes to run, assign the input data records to groups. A group is a set of records that contain identical values on a port that you specify. When you perform match analysis on grouped data, the Match transformation analyzes the records within each group. The transformation does not compare the records in one group with the records in another group. The groups reduce the overall number of comparisons that the transformation must perform without any loss of accuracy in the mapping analysis.
Consider the following rules and guidelines when you organize data into groups:
  • The port on which you group the data is the group key port. A group key port must contain a range of duplicate values, such as a city name or a state name in an address data set. If the mapping data does not contain a usable group key port, use the Key Generator to create the port from the current mapping data. Connect the group key output port from the Key Generator transformation to the Match transformation.
    You can also use the Key Generator transformation to add sequence identifiers to the mapping data.
  • Field match operations must specify a group key port. If you configure the Match transformation for identity analysis, do not select a group key port. The identity analysis generates group keys for the identity index data.
  • Do not specify a group key port that you plan to use in the match analysis.
  • When you create groups, you must verify that the groups are a valid size. If the groups are too small, the match analysis might not find all the duplicate data in the data set. If the groups are too large, the match analysis might return false duplicates. Select group keys that create an average group size of 10,000 records.
  • Groups do not reorder the position of the records in the mapping data set.

0 COMMENTS

We’d like to hear from you!