When you configure the Key Generator transformation, connect the data source ports that you want to analyze. Specify the port that contains the group key data. If the records do not include a unique identifier, use the sequence ID port to add unique identifiers to the record.
When you specify the group key port, consider the following guidelines:
Select a port that contains values that repeat regularly in the port data. Preferably. select a port that creates groups that are similar in size.
Select a port that is not relevant to the duplicate analysis.
In the current example, select the City port as the group key. If an account name appears more than once in a city, the accounts might contain duplicate data. If an account name appears more than once in different cities, the accounts are unlikely to be duplicates.
Tip:
Run a column profile on the data source before you select the group key port. The profile results can indicate the number of times each value appears on a port.