Introduction to SSA-NAME3 User Guide

Introduction to SSA-NAME3 User Guide

Part 1: Glossary of Terms

Part 1: Glossary of Terms

This section provides information of terms used in SSA-NAME3.
Accept Limit
Accept-Limit is the score above which a candidate record is considered an accepted match. The Match Decision returned is set to "A". It is combined with Reject Limit such that records attaining a score between the Accept and Reject limits have a Match Decision set to U (Undecided). It is pre-defined in a Population rule-set, and can be overridden by the search application.
Candidates
The set of records returned from a name search. Each candidate should be compared with the original search record using the ssan3_match function and the accepted, and optionally the suspect, records displayed or otherwise further processed.
Compound Name
A name field which explicitly refers to more than one simple name, for example:
INFORMATICA CORPORATION dba IDENTITY SYSTEMS JOHN SMITH and GEORGE BROWN
Controls
The keywords, values and parameters that govern how an API Function will operate and which are passed through the API. The format and contents of the controls will depend on the type of function being used. More information can be found in the Controls section of the
API REFERENCE
manual.
Custom Population
Standard Population rules modified by Informatica for a user with special search and matching requirements and re-packaged as a Custom Population. These may also have been built from scratch for a totally new type of population.
Delimiter
Special characters contained within the data which separate distinct fields or keywords.
Developer’s Workbench
A Java-based GUI used by the developer for understanding and testing the various API Functions, accessing online documentation, and generating sample programs.
Edit Rule Wizard
A Java GUI tool that helps a business user safely add certain types of Edit Rules to the Standard or Custom Population without requiring specific knowledge of SSA-NAME3 or support from a programmer or data analyst. The types of rules that can be added using this tool are:
  • Discard a word or phrase when searching and matching (e.g. a new "noise" word)
  • Add a new replacement word or phrase when searching and matching (e.g. a new "abbreviation" or "acronym")
  • Add a new compound name marker word
Efield
A field that is used for Matching, and not used for Key or Range building, that supports Edit-list rules. Examples of this are:
  • ID numbers
  • Telephone Numbers
  • Postal Codes
  • Efields benefit from Edit-lists to overcome such problems as when an ID number that contains all 9’s should be considered a "null" number. Therefore, a rule would be required to treat all 9’s as a noise word.
Error Message
A description of any error which may have occurred during an API Function call.
Extended Keys
For high-risk and critical search applications, this is the Key Level to be used when generating SSANAME3 Keys. In contrast to Standard Keys, Extended Keys include keys built from additional token concatenation.
File Data
The data retrieved from the file as a result of finding a set of candidate records using the key ranges returned from the
ssan3_get_ranges
function. The File Data is compared with the Search Data by the
ssan3_match
function to calculate a Score and Match decision.
Filtering
The process of discarding candidates that fail to meet a certain Score threshold or are deemed "Rejected". This reduces the number of records that need to be passed back over the network, shown to the user or further processed by a program.
Function
An SSA-NAME3 API Function which is called from the application to perform a distinct task. For example,
ssan3_get_keys
will generate SSA-NAME3 Keys;
ssan3_get_ranges
will generate a Key Ranges Array;
ssan3_match
will match a pair of records and return a Score and Match Decision. These functions and more are defined in detail in the
API REFERENCE
manual.
Fuzzy Keys
In the context of SSA-NAME3, Fuzzy Keys is a term that refers to the special SSA-NAME3 Keys built from names or addresses that have been treated by a variety of techniques to overcome the error and variation in the data.
Initial
A single character word or the first character of a word.
Key Field
The field used to build SSA-NAME3 Keys using the
ssan3_get_keys
function. In SSA-NAME3 Standard Populations, the supported Key Fields are Person_Name, Organization_Name and Address_Part1.
Key Field Data
The value(s) of the field used to build SSA-NAME3 Keys using the
ssan3_get_keys
function call. The keys generated from the call are stored by the user’s application in a user-defined "SSA-NAME3 Key Table" within the database. The SSA-NAME3 Key Table is designed by the user’s DBA. The SSANAME3 Keys column must be indexed.
When using the
ssan3_get_ranges
function from a search application, the Key Field Data will contain the value(s) of the field used to initiate the search.
Key Field Data may consist of one value or several repeating values. Examples of repeating values are: a person’s name and their maiden name; a residential address and a mailing address.
Key Generation
The process whereby a user application calls the
ssan3_get_keys
function to generate SSA-NAME3 Keys from the Key Field Data (typically a name or an address). The application will then store these keys in a database table referred to as the SSA-NAME3 Key Table.
Keys Count
The number of keys returned from the
ssan3_get_keys
function call. This value is used by the application code to ensure that all of the keys returned are stored in the SSA-NAME3 Key Table.
Key Level
Refers to the number and variety of keys to be generated by an
ssan3_get_keys
call. The three Key Levels are Standard keys, Extended keys and Limited keys.
Limited Keys
If disk space is limited, SSA-NAME3 can generate "Limited" SSA-NAME3 Keys. Limited keys are a subset of Standard keys. However, the designer/developer should be aware that the use of Limited keys, while saving on disk space, may also reduce search reliability.
Local Population
A Standard Population or Custom Populationthat has been modified locally via either the Population Override Manageror Edit RuleWizard.
Major Word
The word in a Name identified as being the most significant word. In some Search Strategies, it is used as the primary part of the Search key ranges, and for extra weighting in some Match Purposes (e.g. family name in a Household Purpose).
Matching
The process whereby a user application calls the
ssan3_match
function to compare two records, usually a Search and a File record, and compute a Score and Match Decision.
Match Purpose
The ultimate business purpose of the search/match application. This will be provided as a parameter to the
ssan3_match
function. Examples of Match Purposes are "same name", "same individual", "same resident", "same household", "same organization", "same division", "same corporate entity", "same contact", "same address".
Match Decision
A 1-byte character value which identifies the judgment on the matched records. Values are "A" for Accept, "U" for Undecided and "R" for Reject. The thresholds by which these decisions are chosen can be varied by the user.
Match Level
Used in defining the level of Matching to be performed for a particular search application. In most Standard Populations, possible values are Conservative, Typical and Loose. The three possible values allow adjustment to the "tightness" of the match.
Minor Word
Any word in a Name which is not the Major word.
Name
The name of a person, company, business or organization; an address; a product title, song title or book title; any short description. A name consists of a number of words and optionally codes, each with a limit of 24 characters.
Name Format
An internal setting that specifies at what end of a name or address (Left or Right) the Major Wordcan be found. This can be overridden by an application program in certain API calls.
Noise Words
Words that do not contribute to, and can impede, a search or match function. Such words are removed when SSA-NAME3 processes a name through an
ssan3_get_keys
,
ssan3_get_ranges
or
ssan3_match
call. Examples of Noise Words are Personal Titles (e.g. Mr., Mrs.), Street Types (, Rd.) and Company legal endings (e.g. Inc., Ltd.). As with other Edit rules, Noise Words are population specific and vary according to what Standard Population is being used.
Population
This typically refers to the country and language of the data to be used in the search system; however, populations can be both super- and sub-sets of country and language populations. An example of super-set population is the combination of all Western European populations into the one population for searching and matching. An example of a sub-set population is the USA’s OFAC list of Specially Designated Nationals.
Population Override Manager
A Java GUI tool that helps a trained data analyst override some of the Standard Population rules that are supplied with the product, or provided in the form of a Custom Population. The types of rules that can be overridden using this tool are:
  • Edit-list rules
  • Frequency tables
  • Scalar Frequency Tables
Use of this tool without proper training from Informatica is not recommended, as improper use can adversely affect the reliability and performance of the search application(s).
Ranges Array
Returned from a call to the
ssan3_get_ranges
function. The Ranges Array is a set of "Start" and "End" SSA-NAME3 Key values. These should be used by the user’s search application to form a set of SQL select statements that retrieve records within those ranges.
Ranges Count
The number of in the Ranges Array. This is the number of ranges which the calling program must process.
Ranking
The process of sorting the Matched candidates in descending order by Score in order to display the records to the user in descending order of their likeness to the search identity.
Reject Limit
Reject-Limit is the score below which a candidate record is considered a rejected match. The Match Decision returned is set to "R". It is combined with Accept Limit such that records attaining a score between the Accept and Reject limits have a Match Decision set to U (Undecided). It is pre-defined in a Population rule-set, and can be overridden by the search application.
Reliability
A measure of the likelihood that a Search Strategy will find a name in the database if one exists that should be considered a match to the search name.
Required Keys
Refers to the Standard, Extended or Limited SSA-NAME3 Keys computed when a name or address is processed by the
ssan3_get_keys
function. They are referred to as "required" because all of the keys must be stored in the database table.
The default SSA-NAME3 Keys are 8 bytes in length and consist of printable characters. An option exists to generate 5-byte binary keys if your database supports such keys. The application program will store these key values in a separate table within the database specifically designed and optimized by your DBA for searching and matching. This table will be sorted and indexed on the column storing the SSA-NAME3 Keys.
Response Code
Gives an indication of the validity of a call to SSA-NAME3. A Response Code value of zero indicates a successful call. If the Response Code is not zero, then a description of the problem will be reported in the Error Message parameter.
Scatter / Gather Data Format
This is a method of formatting the input data when using
ssan3_get_keys
,
ssan3_get_ranges
and
ssan3_match
function calls.
Score
A numeric value between 000 and 100 returned from the
ssan3_match
call. It indicates how close a match was achieved after comparing the Search Data and File Data. The actual Score returned will depend on the Match Purpose, Match Level and the Search and File Data being compared.
Score Limit
A numeric value between 000 and 100 that defines the threshold for the Match Decisionfor a specific Match Purposeand Match Levelfor a given Population rule-set. Score Limits are pre-defined in the Population rule-sets, and can be overridden by the calling program.
Search Data
The transaction data which contains the search information. It will contain the field value used to drive the search (that is, used in the
ssan3_get_ranges
call) as well as all of the available data to be compared with the File Data during the
ssan3_match
call.
Search Dialogue
The method by which a search application receives search data from an input screen, processes the Ranges Array generated from the search data, and displays the ranked records back to the user.
Search Level
Used in defining the type of Search Strategyto use for a particular search application. In most Standard Populations, possible values are Narrow, Typical, Exhaustive and Extreme. The four possible values allow adjustment to the "thoroughness" of the search. The wider the search, the more candidates are typically returned, which may increase the reliability of the search but also use more resources and take longer.
Search Strategy
The combination of Key Field and Search Level passed to the
ssan3_get_ranges
function to generate the Ranges Array.
Selectivity
The percentage of the database (that is, number of candidates / total number of database rows) that is retrieved to satisfy a particular search.
SSA-NAME3 Keys
SSA-NAME3’s intelligent keys are computed when a name or address is processed by the
ssan3_get_keys
function. SSA-NAME3 Keys can be of 3 types: Standard Keys, Extended Keys or Limited Keys. The Keys are 8 bytes in length and consist of printable characters. An option exists to generate 5-byte binary keys. The application program will store these key values in a new table within the database or in a new indexed file specifically designed and optimized for searching and matching. This table will be sorted and indexed on the column storing the SSA-NAME3 Keys.
Standard Keys
For typical applications, this is the Key Level to be used when generating SSA-NAME3 Keys. Standard Keys overcome more variation than Limited Keys while using less disk space than Extended Keys. High-risk and critical applications, however, should use Extended keys.
Standard Population (SP)
Standard algorithms which support various searching and matching rules and requirements, typically for a specific language and country. Note: all Standard Populations are delivered with the product, however a separate license is required to use the double-byte character sets covered by the
SSANAME3- CJK-SUPPORT
product.
System
Describes the use for the Name Search, for example, your project name. The System name is used to define the name of a folder or sub-directory where the Standard or Custom Population for this system should be stored and secured.
Tagged Data Format
This is a method of formatting the input data when using
ssan3_get_keys
,
ssan3_get_ranges
and
ssan3_match
function calls.
In Tagged Format, the offsets and lengths of the data fields being passed do not need to be specified. Instead, a notation of labels and delimiters is used to break up the fields. By default the delimiter is an asterisk but it can be user defined.
Token
The word or code components of a Name or Address.
Server Platform
The combination of hardware and operating system that will host the application that calls SSANAME3 and accesses the database.
Workbench
See the
WORKBENCH USER
guide for more information.