Single-Source field matching
Single-source field match analysis compares data from every record in a data set with every other record. The analysis generates a numerical score for every pair of records that it compares. To reduce processing time, the transformation uses one or more Key fields to organize the input records into groups prior to match analysis. You select the Key fields. The number of record pairs created depends on the number of records within a group.
The number can be calculated by the following formula:

where n is the number of records in the group.
Group size has a significant impact on performance. For example, applying the formula above to a group of 2,000 records will produce 1,999,000 matches. Applying the formula to a group of 5,000 records will produce 12,497,500 matches, or over six times the amount.
For optimal performance, groups of over 10,000 are not recommended. Group sizes should be meaningful, so that you do not miss possible matches, but they should not be too large.
If you perform matching on a large data set, the Match transformation may not be able to store all comparison pairs in memory, and some pairs will be written to disk. The Cache Size property on the transformation determines the amount of memory available.
The following image shows the property:
A cache size value below 65536 is measured in megabytes, and any higher value is measured in bytes.
The Cache Directory property identifies a storage area for the temporary files that match analysis creates. Configure the cache directory on the smallest, fastest disk for performance improvements.
Where possible, do not use pass-through ports on the Match transformation, especially in large data sets. The pass-through ports take up valuable memory or disk space. To reunite the ports with the matched records, you can use a Joiner transformation that reads the sequence ID values.
The Match transformation can generate Link Score and Driver Score values that represent the degrees of similarity between different pairs of records in a cluster of matching records.
For optimum performance, choose Link Scores and not Driver Scores. Choosing Driver Scores will greatly decrease the performance of your match mapping, as Driver Scores write more information to disk.
Selecting the Filter Exact Match property significantly improves match performance if the data contains a significant number of exactly matched pairs. Otherwise the option has a negligible performance impact.
The following image shows the Filter Exact Match property: