Table of Contents

Search

  1. Preface
  2. Introduction
  3. The Design Issues
  4. Standard Population Choices
  5. Parsing, Standardization and Cleaning
  6. Customer Identification Systems
  7. Fraud and Intelligence Systems
  8. Marketing Systems
  9. Simple Search
  10. Composite Keys
  11. Summary

Application and Database Design Guide

Application and Database Design Guide

Match Purposes

Match Purposes

SSA-NAME3’s Matching services are used by applications, such as Informatica IR, MDM Registry- Edition & DCE, to filter, rank or match the candidate records returned from a search. The identity data from the search is compared to the identity data from the candidate record, and a score or a ruling is returned. Pre-built Matching algorithms are provided to address today’s common business purposes. These are called "Match Purposes". In combination with the Match Purpose, a selectable Match Level determines the tightness or looseness of the match. The application may also override the Score threshold, which determines the match ruling returned.
SSA-NAME3 Matching is designed to compensate for the error and variation in identity data. The matching logic is comprised of heuristic algorithms that are optimized for each class of data (example: name, organization, address, dates, codes). The algorithms include numerous rules and switches to handle initials, aliases, common variations, prefixes, suffixes, transpositions and word order.
Additionally, all Match Purposes use string cleaning routines, Edit-Lists, different matching Methods for different data types, optimized Matching options, field and token level weighting and phonetic/ orthographic stabilization.
Each Match Purpose supports a combination of mandatory and optional fields and each field is weighted according to its influence in the match decision. Some fields in some Purposes may be "grouped". Two types of grouping exist:
  • A "Required" group requires at least one of the field members to be non-null;
  • A "Best of" group will contribute only the best score from the fields in the group to the overall match score.
For example, in the "Individual" Match Purpose:
  • Person_Name
    is a mandatory field.
  • One of either ID Number or Date of Birth is required.
  • Other attributes are optional.
The overall score returned by each Purpose is calculated by adding the participating field scores multiplied by their respective weight and divided by the total of all field weights. If a field is optional and is not provided, it is not included in the weight calculation.
The weights and matching options used in the Standard Populations are internally set by Informatica’s Population experts based on years of tuning experience. They are not available to be overridden by the application. However, if a user has a different need not supported by the Standard Population, Informatica Corporation may offer to build a Custom Population for that client.

Field Types

Below are descriptions of the fields supported by the various Match Purposes, provided in alphabetical order:
Field
Description
Address_Part1
Typically includes that part of address up to, but not including, the locality "last line". The word order, that is the position of the address components, should be the normal word order used in your data population.
These should be passed in one field. Depending on table design, your application may need to concatenate these attributes into one field before calling SSA-NAME3. For example, in the US, a typical string to pass would comprise of:
Care-of + Building Name + Street Number + Street Name + Street Type + Apartment Details
Matching on
Address_Part1
uses methods and options designed specifically for addresses. It has its own Edit-List whose rules can be overridden by the Population Override Manager or Edit Rule Wizard. It is also possible to supply the entire address in the
Address_Part1
field for matching.
The application may pass multiple addresses (such as a residential address and a postal address) in the one call to SSA-NAME3. See the
Key Fields
section for more details on
Address_Part1
.
Address_Part2
Typically includes the "locality" line in an address. For example, in the US, a typical string to pass would comprise of:
City + State + Zip (+ Country)
Matching on
Address_Part2
uses methods and options designed specifically for addresses. It uses the same Edit-List as
Address_Part1
. The rules in this Edit-List can be overridden by the Population Override Manager or Edit RuleWizard.
Attribute1, Attribute2
These are two general purpose fields. They are matched using a general purpose string matching algorithm that compensates for transpositions and missing characters or digits.
Date
This field is used for matching any type of date (example: date of birth, expiry date, date of contract, date of change, creation date, etc).
It expects the date to be passed in Day+Month+Year order. It supports the use or absence of delimiters between the date components.
Matching on dates uses methods and options designed specifically for dates. It overcomes the typical error and variation found in this data type.
ID
The ID field is used for matching any type of ID number (example: Account number, Customer number, Credit Card number, Drivers License number, Passport, Policy number, SSN or other identity code, VIN, etc).
It uses a string matching algorithm that compensates for transpositions and missing characters or digits. It also has its own Edit-List whose rules can be overridden by the Population Override Manager or Edit Rule Wizard.
Model_Number
Used to match the model number of products. The
Model_Number
field is a matching field and can contain alphanumeric characters. This field compares two strings to match the product model numbers.
Organization_Name
Used to match the names of organizations. These could be company names, business names, institution names, department names, agency names, trading names, etc.
This field supports matching on a single name, or a compound name such as a legal name and its trading style. It has its own Edit-List whose rules can be overridden by the Population Override Manager or Edit Rule Wizard.
The application may also pass multiple names (example, a legal name and a trading style) in the one call to SSA-NAME3.
See the
Key Fields
section for more details on
Organization_Name
.
Person_Name
Used to match the names of people. An application should pass the full person name. The word order, that is the position of the first name, middle names and family names, should be the normal word order used in your data population. For example, in English speaking countries, the normal word order would be:
First Name + Middle Name(s) + Family Name(s)
Depending on table design, your application may have to concatenate these separate fields into one field before calling SSA-NAME3.
This field supports matching on a single name, or an account name such as JOHN & MARY SMITH.
The application may also pass multiple names (example, a married name and a former name) in the one call to SSA-NAME3.
It has its own Edit-List whose rules can be overridden by the Population Override Manager or Edit Rule Wizard.
See the
Key Fields
section for more details on
Person_Name
.
Postal_Area
The
Postal_Area
field can be used to place more emphasis on the postal code than if it were included in the
Address_Part2
field. It is used for all types of postal codes, including Zip codes.
It uses a string matching algorithm that compensates for transpositions and missing characters or digits. It also has its own Edit-List whose rules can be overridden by the Population Override Manager or Edit Rule Wizard.
Product_Description
Used to match the description of products. The matching rules for product names are not as strict as for a person name or an organization name. The
Product_Description
field uses word pairs or a bigram to search, match, and handle additional variations between product names and description.
The default key length of the
Product_Description
field is 8 bytes.
Product_Name
Used to match the names of products. The matching rules for product names are not as strict as for a person name or an organization name. The
Product_Name
field uses word pairs or a bigram to search, match, and handle additional variations between product names and description.
The default key length of the
Product_Name
field is 8 bytes.
Telephone_Number
The
Telephone_Number
field is used to match telephone numbers.
It uses a string matching algorithm that compensates for transpositions and missing digits or area codes. It also has its own Edit-List whose rules can be overridden by the Population Override Manager or Edit Rule Wizard.

Purposes Types

Below are descriptions of the Purposes supported by the Standard Populations, provided in alphabetical order.

Address

This Purpose is designed to identify an address match. The address might be postal, residential, delivery, descriptive, formal or informal.
This Match purpose is typically used after a search by
Address_Part1
.
Field
Required?
Address_Part1
Yes
Address_Part2
No
Postal_Area
No
Telephone_Number
No
ID
No
Date
No
Attribute1
No
Attribute2
No
The only required field is
Address_Part1
. The fields
Address_Part2
,
Postal_Area
,
Telephone_Number
,
ID
,
Date
,
Attribute1
and
Attribute2
are available as optional input fields to further differentiate an address. For example if the name of a City and/or State is provided as
Address_Part2
, it will help differentiate between a common street address [100 Main Street] in different locations.
To achieve a "best of" score between
Address_Part2
and
Postal_Area
, pass
Postal_Area
as a repeat value in the
Address_Part2
field. For example:
*Address_Part2*100 Main St*Address_Part2*06870***
In this case, the
Address_Part2
score used will be the higher of the two scored fields.

Contact

This Purpose is designed to identify a contact within an organization at a specific location.
This Match purpose is typically used after a search by
Person_Name
. However, either
Organization_Name
or
Address_Part1
could be used as the search criteria.
For ultimate quality, a tiered search using two or all three of these fields could be used in the search. A tiered search is for example, a
Person_Name
search followed by an
Address_Part1
search.
Field
Required?
Person_Name
Yes
Organization_Name
Yes
Address_Part1
Yes
Address_Part2
No
Postal_Area
No
Telephone_Number
No
ID
No
Date
No
Attribute1
No
Attribute2
No
The required fields are
Person_Name
,
Organization_Name
, and
Address_Part1
. This is designed to successfully match person X at company Y and address Z.
To further qualify a match, the fields
Address_Part2
,
Postal_Area
,
Telephone_Number
,
ID
,
Date
,
Attribute1
and
Attribute2
may be optionally provided.
To achieve a "best of" score between
Address_Part2
and
Postal_Area
, pass
Postal_Area
as a repeat value in the
Address_Part2
field. For example:
*Address_Part2*100 Main St*Address_Part2*06870***
In this case, the
Address_Part2
score used will be the higher of the two scored fields.

Corporate Entity

The
Corporate Entity
Purpose is designed to identify an Organization by its legal corporate name, including the legal endings such as INC, LTD, etc. It is designed for applications that need to honor the differences between such names as ABC TRADING INC and ABC TRADING LTD.
This Match purpose is typically used after a search by
Organization_Name
.
Field
Required?
Organization_Name
Yes
Address_Part1
No
Address_Part2
No
Postal_Area
No
Telephone_Number
No
ID
No
Attribute1
No
Attribute2
No
It is in essence the same purpose as Organization, except that tighter matching is performed and legal endings are not treated as noise.
To achieve a "best of" score between
Address_Part2
and
Postal_Area
, pass
Postal_Area
as a repeat value in the
Address_Part2
field. For example:
*Address_Part2*100 Main St*Address_Part2*06870***
In this case, the
Address_Part2
score used will be the higher of the two scored fields.

Division

The
Division
Purpose is designed to identify an Organization at an Address. It is typically used after a search by
Organization_Name
or by
Address_Part1
, or both.
Field
Required?
Organization_Name
Yes
Address_Part1
Yes
Address_Part2
No
Postal_Area
No
Telephone_Number
No
ID
No
Attribute1
No
Attribute2
No
It is in essence the same purpose as Organization, except that Address_Part1 is a required field.
Thus, this Purpose is designed to match company X at an address of Y (or Z, etc, if multiple addresses are supplied).
To achieve a "best of" score between
Address_Part2
and
Postal_Area
, pass
Postal_Area
as a repeat value in the
Address_Part2
field. For example:
*Address_Part2*100 Main St*Address_Part2*06870***
In this case, the
Address_Part2
score used will be the higher of the two scored fields.

Family

The Family purpose is designed to identify matches where individuals with the same or similar family names share the same address or the same telephone number.
This purpose is typically used after a tiered search (multi-search) by
Address_Part1
and
Telephone_Number
.
It is not practical to search by
Person_Name
because ultimately only one word from the Person_Name needs to match, and a one-word search will not perform well in most situations.
Field
Required?
Person_Name
Yes
Address_Part1
Yes
Telephone_Number
Yes
Address_Part2
No
Postal_Area
No
Attribute1
No
Attribute2
No
The score will be based on best of the above group.
Emphasis is placed on the Last Name, or "Major Word" of the
Person_Name
field, so this is one of the few cases where word order is important in the way the records are passed to SSA-NAME3 for matching.
However, a reasonable score will be generated provided that a match occurs between the major word in one name and any other word in the other name.
Required fields are
Person_Name
,
Address_Part1
and
Telephone_Numbe
r. Optional qualifying fields are
Address_Part2
,
Postal_Area
,
Attribute1
, and
Attribute2
.
To achieve a "best of" score between
Address_Part2
and
Postal_Area
, pass
Postal_Area
as a repeat value in the
Address_Part2
field. For example:
*Address_Part2*100 Main St*Address_Part2*06870***
In this case, the
Address_Part2
score used will be the higher of the two scored fields.

Fields

This Purpose is provided for general non-specific use. It is designed in such a way that there are no required fields. All field types are available as optional input fields.
Field
Required?
Person_Name
No
Organization_Name
No
Address_Part1
No
Address_Part2
No
Postal_Area
No
Telephone_Number
No
ID
No
Date
No
Attribute1
No
Attribute2
No
One way this Purpose could be used is as a non-exact match filter before applying some other Match Purpose. For exact match filters, use the
Filter
Purpose. For example, before passing a record to the
Division
Purpose, use the
Fields
Purpose to eliminate any company with
ID
numbers which do not score above 80%. To do this, the application would first pass the
ID
numbers to SSA-NAME3 for matching using
PURPOSE=FIELDS
, and then decide based on the score returned whether to pass the full records for matching by the
Division
Purpose.

Filter1-9

The Filter Purpose is provided so that the application can perform exact match filtering based on the setting of one or more flags in the records. One call to
ssan3_match
can use up to nine Filters (
Filter1-9
).
Field
Required?
Filter1-9
Yes
For example, say an index supported searching and matching across two types of names: Company names (identified by a Name-Type-Flag of "C"), and Person names (identified by a Name-Type-Flag of "P"). A search application may need to support searches across both name types, as well as within each name type. To support the "within each name type" search, the application can use the Filter Purpose to filter out exact matches based on the name type flag.
The fields
Filter1-9
can be any code or flag.
For non-exact filtering, use the
Fields
Purpose.

Household

The Household purpose is designed to identify matches where individuals with the same or similar family names share the same address.
This purpose is typically used after a search by
Address_Part1
.
It is not practical to search by
Person_Name
because ultimately only one word from the Person_Name needs to match, and a one-word search will not perform well in most situations.
Field
Required?
Person_Name
Yes
Address_Part1
Yes
Address_Part2
No
Postal_Area
No
Telephone_Number
No
Attribute1
No
Attribute2
No
Emphasis is placed on the Last Name, or "Major Word" of the
Person_Name
field, so this is one of the few cases where word order is important in the way the records are passed to SSA-NAME3 for matching.
However, a reasonable score will be generated provided that a match occurs between the major word in one name and any other word in the other name.
Required fields are
Person_Name
and
Address_Part1
. Optional qualifying fields are
Address_Part2
,
Postal_Area
,
Telephone_Number
,
Attribute1
, and
Attribute2
.
To achieve a "best of" score between
Address_Part2
and
Postal_Area
, pass
Postal_Area
as a repeat value in the
Address_Part2
field. For example:
*Address_Part2*100 Main St*Address_Part2*06870***
In this case, the
Address_Part2
score used will be the higher of the two scored fields.

Individual

This Purpose is designed to identify a specific individual by name and with either the same ID number or Date of Birth attributes.
It is typically used after a search by
Person_Name
.
Field
Required?
Person_Name
Yes
ID
At least one
Date
of these two
Attribute1
No
Attribute2
No
The required fields are
Person_Name
, and one of either
ID
and
Date
.
The fields
Attribute1
and
Attribute2
may be optionally provided to further qualify the match.

Organization

The Organization Purpose is designed to match organizations primarily by name. It is targeted at online searches when a name only lookup is required and a human is available to make the choice. Matching in batch would typically require other attributes in addition to name to make match decisions.
Field
Required?
Organization_Name
Yes
Address_Part1
No
Address_Part2
No
Postal_Area
No
Telephone_Number
No
ID
No
Date
No
Attribute1
No
Attribute2
No
The only required field is
Organization_Name
. The fields
Address_Part1
,
Address_Part2
,
Postal_Area
,
Telephone_Number
,
ID
,
Date
,
Attribute1
and
Attribute2
may are also provided as optional input fields to refine the ranking.
To achieve a "best of" score between
Address_Part2
and
Postal_Area
, pass
Postal_Area
as a repeat value in the
Address_Part2
field. For example:
*Address_Part2*100 Main St*Address_Part2*06870***
In this case, the
Address_Part2
score used will be the higher of the two scored fields.

Person_Name

This Purpose is designed to identify a Person by name. It is targeted at online searches when a name only lookup is required and a human is available to make the choice. Matching in batch would typically require other attributes in addition to name to make match decisions.
Field
Required?
Person_Name
Yes
Address_Part1
No
Address_Part2
No
Postal_Area
No
Telephone_Number
No
ID
No
Date
No
Attribute1
No
Attribute2
No
The only required field is
Person_Name
. The optional fields available for this purpose are
Address_Part1
,
Address_Part2
,
Postal_Area
,
Telephone_Number
,
ID
,
Date
,
Attribute1
, and
Attribute2
.
To achieve a "best of" score between
Address_Part2
and
Postal_Area
, pass
Postal_Area
as a repeat value in the
Address_Part2
field. For example:
*Address_Part2*100 Main St*Address_Part2*06870***
In this case, the
Address_Part2
score used will be the higher of the two scored fields.

Product

Use the
Product
purpose to identify and match the products by names, description, and other details, such as model numbers. Matching in batches requires other attributes in addition to the product name to make match decisions.
The following table lists the fields that you can specify for the
Product
purpose:
Field
Required
Product_Name
Yes
Product_Description
No
Model_Number
No
Company_Name
No
ID
No
Code
No
Attribute1
No
Attribute2
No

Resident

The Resident Purpose is designed to identify a person at an address.
This purpose is typically used after a search by either
Person_Name
or
Address_Part1
, or both in a multi-search.
Field
Required?
Person_Name
Yes
Address_Part1
Yes
Address_Part2
No
Postal_Area
No
Telephone_Number
No
ID
No
Date
No
Attribute1
No
Attribute2
No
The required fields are
Person_Name
and
Address_Part1
. The fields
Address_Part2
,
Postal_Area
,
Telephone_Number
,
ID
,
Date
,
Attribute1
and
Attribute2
are optional input fields to help qualify or rank a match if more information is available.
To achieve a "best of" score between
Address_Part2
and
Postal_Area
, pass
Postal_Area
as a repeat value in the
Address_Part2
field. For example:
*Address_Part2*100 Main St*Address_Part2*06870***
In this case, the
Address_Part2
score used will be the higher of the two scored fields.

Wide_Contact

This Purpose is designed to loosely identify a contact within an organization - that is without regard to actual location.
It is typically used after a search by
Person_Name
, however, a second search by
Organization_Name
could be used to get better quality.
Field
Required?
Person_Name
Yes
Organization_name
Yes
ID
No
Attribute1
No
Attribute2
No
The fields required for this Purpose are
Person_Name
and
Organization_Name
. This is designed to successfully match a person X at company Y.
In addition to the required fields,
ID
,
Attribute1
and
Attribute2
may be optionally provided for matching to further qualify a contact.

Wide_Household

The
Wide_Household
purpose is designed to identify matches where the same address is shared by individuals with the same family name or with the same telephone number.
This purpose is typically used after a search by
Address_Part1
.
It is not practical to search by
Person_Name
because ultimately only one word from the
Person_Name
needs to match, and a one-word search will not perform well in most situations.
Field
Required?
Address_Part1
Yes
Person_Name
Yes
Telephone_Number
Yes
Address_Part2
No
Postal_Area
No
Attribute1
No
Attribute2
No
The score will be based on best of the above group.
Emphasis is placed on the Last Name, or "Major Word" of the
Person_Name
field, so this is one of the few cases where word order is important in the way the records are passed to SSA-NAME3 for matching.
However, a reasonable score will be generated provided that a match occurs between the major word in one name and any other word in the other name.
Required fields are
Person_Name
,
Address_Part1
and
Telephone_Number
. Optional qualifying fields are
Address_Part2
,
Postal_Area
,
Attribute1
and
Attribute2
.
To achieve a "best of" score between
Address_Part2
and
Postal_Area
, pass
Postal_Area
as a repeat value in the
Address_Part2
field. For example:
*Address_Part2*100 Main St*Address_Part2*06870***
In this case, the
Address_Part2
score used will be the higher of the two scored fields.

0 COMMENTS

We’d like to hear from you!