Introduction to SSA-NAME3 (EXTN) Service Groups

Introduction to SSA-NAME3 (EXTN) Service Groups

Appendix A: Glossary

Appendix A: Glossary

This section provides glossary of terms.
Account Name
A name field, often referring to account details, which implicitly refers to more than one simple name, for example,
JOHN AND MARY SMITH
See also section
Compound Name
.
Algorithm
A combination of SSA-NAME3 routines that have been generated for a specific Population (example, person names, company names, street names). The Algorithm is accessed through Calls to Services which are linked to it.
Alternate Keys
The Name-key(s) built from different word orders in a name. These can be either Positive or Negative Keys.
Authorization
The process of collecting the signatures and details of the routines linked to an Algorithm. These signatures are checked at run-time to ensure that no ’rogue’ modules have been linked.
Authorized Algorithm
The Algorithm for a Population that is currently available for use by application programs.
Bad key
SSA-NAME3 never returns an unusable Name-key. If for some reason a key could not be generated a special "bad key" is returned. This bad key has a value of 800000HEX, when using 5-byte binary keys, and
K$$$$$$$
when using 8-byte character keys. It can be used to group bad names so that they can be found.
An example would be the name,
THE LIMITED
. If both words in this name are removed by the Edit-list the result is that there is no name to build the key from. In this case the bad key is returned.
Candidates
The set of records returned from a
Name
search. For optimum quality these candidates should be passed to the
Matching
Service for further qualification before being displayed or otherwise used in a search process.
Cascade
The name given to the Search-table structure built for the most common type of Positive Search strategy. The Search-table starts with the narrowest Name-key range, which could contain the ’searched-for’ record, and continues with progressively widening ranges.
Cleaning
The process of applying character set conversion rules to a
Name
with the intention of cleaning and/or converting unwanted characters.
Code-character
Any character marked as a code in character-set table 2. This is normally only the digits 0 - 9.
Code-word
Any token with one of the following attributes:
  1. 2 or more code characters,
  2. an initial that is a code character,
  3. 1 or 2 characters in length and either preceded or followed by a Code-word.
Common Name
A
Name
that occurred often enough in the sample user data provided to the section Frequency Table generation to be considered common.
Compound Name
A
Name
field which explicitly refers to more than one simple name, for example,
JOHN SMITH AND GEORGE BROWN
See also section Account Name.
Delimiter
Any character defined as a delimiter in character-set table 4.
Edit-list
A table of user controlled words & phrases that undergo special processing in
Name-key
building and
Matching
, example, noise words, personal titles, prefixes & suffixes, nicknames, common abbreviations, phrase replacements and
Compound Name
markers.
Edit-rule
A line in an Edit-list, for example the line
RR ROB >ROBERT <
is an Edit-rule that says "Replace
ROB
with
ROBERT
".
Fast-start
A collection of sample definition modules for a specific country used to create a first-cut Service Group. The Fast-start definition files are used when first installing SSA-NAME3 as a quick way to test the Installation process and environment. They are later used as the basis for further customization work to make SSA-NAME3 achieve the objectives of the application and end-users.
Filtering
The process of
Matching Candidates
to reduce the number of records shown to the user.
Formatting
The process of applying Edit-rules to words, phrases and sub-strings.
Frequency Table
A table generated from an organization’s
Name
data holding the most frequently used words that have not been deleted or skipped as a result of
Edit-list
processing.
Generation
The process of creating compilable source code modules from Definition files, either on a Windows computer or an MVS system.
Initial
A single character word or the first character of a word.
I+n
Nomenclature for "Initial plus n consonants".
Key Generation
The process of building one or more
Name-keys
from the Cleaned, Formatted and Stabilized
Tokens
of a
Name
.
Major-word
The word in a Name identified as being the most significant. It is used as the primary part of the
Preferred Key
, as the primary part of the Search key ranges of a positive search, and for weighting in
Matching Schemes
. See also section
Minor-word
.
Major word-key
A 3-byte or 6-byte key generated from the
Major-word
of a
Name
.
Matching
The process of determining the probability that a search identity and a File entry are the same identity.
Matching Method
The way in which section Matching matches two data items of the same data type. There are methods for names, addresses, dates, strings and codes. (See section
Method
).
Matching Scheme
A definition of the structure of the data items to be matched and the Matching Methods and options to be used.
Method
A routine used for Matching two data items. There are Matching Methods for names, dates, codes and strings. For example, in the following Matching Definition line,
DEFINE METHOD=METHOD1,EP=N3SCL,ALGORITHM=PERSON
The method is
N3SCL
(the name matching method).
Minor-word
Any token in a Name which is not the Major-word and is not a word deleted by an Editrule.
Name
The name of a person, company, business or organization; an address; a product title, song title or book title; any short description. A name consists of a number of words, each with a limit of 24 characters.
Name-key
A compressed five-byte binary or eight-byte character key built from a Name using the NAMESET Service.
Negative Keys
The Name-key(s) built using each non-delete Token (word) in a Name in combination with every other word in the name. See the
Positive Keys
section.
Negative Search Strategy
A method of building a Search-table for a search application whose normal requirement is to prove that a name does not already exists on a database.
Population
Population refers to a class or group of names that requires its own SSA-NAME3 Algorithm. Typical examples of "populations" are: customers, street lines from addresses, song titles, file titles etc.
Positive Keys
The
Name-key
(s) built using each non-delete
Token
(word) in a Name followed by the other words in a set order. See the
Negative Keys
section.
Positive Search Strategy
A method of building a
Search-table
for a search application whose normal requirement is to find a name that already exists on a name database.
Preferred Key
The
Name-key
built from the
Major-word
followed by the
Minor-words
in a
Name
.
Probe
A very narrow search range.
Ranking
The process of sorting the Matched section Candidates to show the records to the user in descending order of their likeness to the search identity.
Reliability
The probability that a section Search Strategy will find a name if one exists that should be considered as a match to the search
Name
.
Response Code
A unique number indicating the success or otherwise of the Service just called.
Scaler Frequency Table
Atable generated from an organization’s
Name
data or an organization’s SSANAME3 keys holding the most frequently occurring
Name-keys
. Generation of this table is optional, and it is used to enhance the scale value returned in the NAMESET search ranges. See the
Search Scale
section.
Score
A value between 1 and 100 returned by the
Matching Service
to an application. This defines the level of confidence that two candidate records match.
Search Contents
A term used to describe the number of
Tokens
(words and
Initials
) used in a particular search range.
Search Depth
A term used to describe the width of a
Name-key
search range or its
Selectivity
.
Search Dialogue
The method by which a search application processes a Search-table and displays the Candidates to the user.
Search Scale
An estimate of the number of Candidates that would be returned using a particular search range.
Search Strategy
The method by which a Search-table is built to achieve the optimum search results for the particular application requirement (e.g.
Positive Search Strategy
or
Negative Search Strategy
).
Search-table
A table of
Name-key
ranges used by a search application to access a
Name
database on a
Name-key
index. This is the physical implementation of a section Search Strategy.
Selectivity
The percentage of records that are accessed to satisfy the average search.
Service
An SSA-NAME3 function that has been defined and generated for some specific user required purpose and for a specific
Population
; e.g. building
Name-keys
and
Search-tables
;
Matching
two records according to a specific set of rules.
Service Group
A collection of SSA-NAME3 Services that are grouped as one program under one name. In Call statements you call a Service Group name requesting a Service, passing parameters according to the service rules.
Service Group Data File
An ASCII text file containing the Service Group "ruleset". It is invoked at runtime by the shared object or dll code.
Service Name
The name used when referring to a Service, e.g. NAMESETP is a name typically used to define a service of type NAMESET for the PERSON Algorithm.
Service Type
The type of a
Service
, this defines its functionality whereas the
Service Group Data File
An ASCII text file containing the Service Group "ruleset". It is invoked at runtime by the shared object or dll code. Service Name is simply a handle to refer to the Service with, e.g. the Service NAMESETP is of type NAMESET.
Skip-word
Any
Token
in a
Name
which is defined by an Edit-rule not to take part in
Name-key
building.
SSA-NAME3
The latest version of the SSA-NAME3
Algorithms
.
Stabilization (Word Stabilization)
The process of applying phonetic and orthographic transformation rules to Name Tokens to stabilize the error and variation.
Suspect Code-word
A word of 3 or more characters that includes 1
Code-character
. See the
Code-word
section.
Token
The individual word components of a section Name after
Cleaning
are called Tokens.
Target Platform
The combination of hardware and software that will execute an SSA-NAME3 Algorithm.
Test-bed
An SSA-NAME3 utility which enables quick and reliable testing of Algorithms and Matching Schemes, either interactively or in batch mode. For Microsoft Windows, a Windows based Test-bed can also be used.
Vowel
Any character defined as a vowel in character-set table 4.
Word-key
A 3-byte or 6-byte key generated from any single
Token
(word).
Word-type
A one-character code assigned to a
Token
by the
Formatting
routine. Possible values are:
B - a Suspect Code-word.
C - a Code-word.
D - a deleted word (used only by the TRACE Service)
I - an Initial.
M - the Major-word.
N - the Major-word if it is a Code-word.
S - a Skip-word.
T - a Skip-word if it is a Code-word.
Y - any other word

0 COMMENTS

We’d like to hear from you!