Developer Transformation Guide

10.4.0
- 10.5.2
- 10.5
- 10.4.1

Back Next

Column Analysis

When you configure a Match transformation, you select one or more columns for analysis.

The Match transformation analyzes columns in pairs. When you select a single column for analysis, the transformation creates a temporary copy of the column and compares the source column with the temporary column. When you select two columns for analysis, the transformation compares the values across the two columns that you select. The transformation compares each value in one column with all of the values in the other column. The transformation returns a match score for each pair of values that it analyzes.

You select the columns to analyze when you configure a strategy in the Match transformation. The strategy specifies the columns to analyze and the algorithm to apply to the columns. The algorithm calculates the levels of similarity between each pair of values. The different algorithms in the transformation use different criteria to measure the levels of similarity between the values. You can define multiple strategies in a transformation, and you can and assign different columns to each strategy.

Column Analysis Example

You want to compare the values in a column of surname data. You create a mapping that includes a data source and a Match transformation. You connect the Surname port to the Match transformation. The transformation creates a temporary copy of the data on the Surname port when the mapping runs.

The following image shows a fragment of the surname data:

The spreadsheet contains two columns of surname data. Column A represents the data on a transformation input port. Column B represents the temporary copy of the data that the transformation generates for match analysis.

The mapping generates a set of match scores that indicate that the following values might be duplicates:

Baker, Barker

Barker, Parker

Smith, Smith

When you review the data, you decide that Baker, Barker, and Parker are not duplicate values. You decide that Smith and Smith are duplicate values.

Rename Saved Search

Table of Contents

Developer Transformation Guide

Developer Transformation Guide

Column Analysis

Column Analysis

Column Analysis Example