Table of Contents

Search

  1. Preface
  2. Introduction
  3. The Design Issues
  4. Standard Population Choices
  5. Parsing, Standardization and Cleaning
  6. Customer Identification Systems
  7. Fraud and Intelligence Systems
  8. Marketing Systems
  9. Simple Search
  10. Summary

Application and Database Design Guide

Application and Database Design Guide

The Basic Function Flow

The Basic Function Flow

The three main SSA-NAME3 functions are Key Building, generation of Search Strategies and Matching.

Key Building

The first application of any SSA-NAME3 implementation is a program to build a Key Index from the names or addresses in the database using the SSA-NAME3 algorithm. This is a user-developed program.
An SSA-NAME3 "Key" is a fixed-length compressed and encoded value built from a combination of the words and numbers in a name or address such that relevant variations of the word will have the same code. In fact, for one name or address, multiple SSA-NAME3 keys are generated.
This default length and format of an SSA-NAME3 key is 8-bytes character. 5-byte binary keys can also be generated if your database supports them and you wish to save some disk space.
In UDB/DB2 databases, it may be necessary to set the
IDENTITY
option at database creation time so that the collating sequence of the 8-byte character keys is correct. Alternatively, the 5-byte binary keys can be used. See the
API REFERENCE
guide for more information.
The SSA-NAME3 keys must be stored in a database table and indexed.
The multiple keys when passed back to an application program is known as a "Keys Array". The actual key values will not be unique so the properties of the column used to store the SSA-NAME3 keys should take this into account.
To index an existing database of names or addresses, an application will call the
ssan3_get_keys
function for every name or address to generate the required Keys Array.
When calling the
ssan3_get_keys
function, it is important to supply the complete name that describes the entity. For example, First Name + space + Middle Name + space + Last Name to SSANAME3 such that keys can be generated on the complete information. SSA-NAME3 will take care of finding and matching names with words out of order.
Within an SSA-NAME3 key, a variety of techniques are used to maximize the retention of valuable "locating" data, while retaining a logical structure that supports varying depths of search and allowing location of candidate records when words are missing or truncated to initials. While the key field has a fixed length, internally it has a variable structure depending upon the commonality of the words in a specific name or address.
The following schematic provides an example of the function flow in a key building program using SSA-NAME3:

Searching

The second application is a program to retrieve records in a search by accessing the SSA-NAME3 key Index. This is a user-developed program.
This program calls the SSA-NAME3 algorithm to build an array of start and end key values that constitute a suitable search strategy for the search name. This is known as the "Ranges Array".
It is these start and end key values which a program uses to drive the search and it is this mechanism that insulates the application program from the need to understand the complex variable structure of the actual key. Calling the
ssan3_get_ranges
function for a given search name will return the required Ranges Array.
After calling the
ssan3_get_ranges
function, the program will select all records which have SSANAME3 Key values between each of the Start and End Key values presented in the Ranges Array.
This selection of records is known as a "Candidate Set" as it contains records which are candidates for matching the search record.
If more data has been supplied to the search other than just the name or address, it is useful to use this extra data to either eliminate records not of interest before the results are further processed, or to confirm the match.
If the results are to be displayed to a user, it is also useful to "rank" the records with more likely candidates at the top of the list.
Both of these processes can be achieved through the SSA-NAME3 Match routines.

Matching

The search program will also make the calls to the SSA-NAME3 Match routines by calling the
ssan3_match
function with a pair of records, the search record and a candidate file record.
The result of this call will be a Match Score and a Match Decision. The
ssan3_match
function is called to compare every candidate record with the search record. A decision can be made on the Score or Decision value as to how to treat each record.
In an online search, after all candidate records have been retrieved and matched, those that were not eliminated can then be ranked in descending order of their score for display back to the screen. This is typically done by the user’s search program performing an in-memory sort of the search results.
The following schematic provides an example of the program flow for the search and matching application:

0 COMMENTS

We’d like to hear from you!