Table of Contents

Search

  1. Preface
  2. Introduction
  3. Definition File Overview
  4. Customization Steps
  5. Service Group Definition
  6. Algorithm Definition
  7. Edit-list Definition
  8. Matching Scheme Definition

Service Group Definition and Customization Guide

Service Group Definition and Customization Guide

Factors Which Determine the Number of Keys to Generate per Name

Factors Which Determine the Number of Keys to Generate per Name

The quality of the search and file data, and the types of searching and matching to be carried out, determine the number of keys to generate for names. The options are:
  • Preferred key (a single key)
  • Positive keys (e.g. 3 keys for a 3 word name)
  • Negative keys (e.g. 8 keys for a 3 word name)

What is the quality of the Name data?

The quality of name data differs from organization to organization and from system to system. For example, some legacy systems which have been capturing data for decades are capturing more or better information now than earlier in their life - maybe only given name initials were required in the systems early days, but now the full given names are required – maybe the size or format of the fields has changed over time and truncated data or word sequence errors are evident – maybe a bug existed in an earlier version of a program and the data has not been fixed. In other cases, the name file may contain a mixture of names from different countries, in different formats.

What types of searches will be done on the Name data?

Will the typical search be able to use the full name or address or is the search data sometimes incomplete. How reliable is the word order of the search data. Is the emphasis of the search on performance or ’not missing’ candidates.

Preferred Key

This is requested by setting
ALTERNATE-KEYS=N
and
SSA-NAME3-OPTIONS #18
to
N
.
These settings cause a single key to be generated for a name. A Preferred Key implementation is normally only used when:
  • the word order in both search and file data is very stable
  • there is commonly only one significant word from the name worth searching on (e.g. a population of addresses where the majority of street names are of the form ’55 Main Street’, i.e. want to search on ’Main 55’ but never on ’55 Main’)
  • there isn’t scope for adding a new database table (to hold multiple keys)
In these cases the Preferred Key can be added directly to the file where the name data is stored. Preferred Keys are normally not recommended for Company names or other long name structures. The default Preferred key (using
NAME-FORMAT=R
and a three word name example) is built by using the word order: Major Word + First word (+ Second Word) This default word order can be altered by the setting of
SSA-NAME3-OPTIONS #19
. Example 1: For the person name, MARY DAVIS SMITH, the default Preferred Key would be:
SMITH + MARY (+ DAVIS)
Example 2: For the street name, 26 VALLEY RD, the default Preferred Key would be:
VALLEY + 26

Positive Keys

To build positive keys, set
ALTERNATE-KEYS=Y
and
SSA-NAME3-OPTIONS #18
to
N
or set
KEYS=POS
.
Positive keys cater for the fact that some search names may not be in the same order as the file name, or may have extra or missing words. For example, using an Algorithm with
NAME-FORMAT=R
, the search name MARY DAVIS would not find the file name MARY DAVIS-SMITH if only a Preferred Key was used because the Preferred Key would be built from SMITH + MARY (+ DAVIS) and the Search ranges would be either DAVIS + MARY for a Positive search, or DAVIS + MARY and MARY + DAVIS for a negative search. Positive Keys generate one key per non-delete word in the name (delete words, e.g. MR, MRS, can be defined in the Edit-list as part of the customization process). The default Positive Keys (using a three word name example) are built using the word order: First Word + Second Word (+ Major Word) Second Word + First Word (+ Major Word) Major Word + First Word (+ Second Word) Example 1: for the name MARY DAVIS SMITH, the default Positive Keys would be:
MARY + DAVIS (+ SMITH) DAVIS + MARY (+ SMITH) SMITH + MARY (+ DAVIS)
(The words in brackets will be used if the preceding words are common.) A search for MARY DAVIS would find the name MARY DAVIS SMITH on either the 1st or 2nd Positive keys, depending on what Search Strategy was being used, i.e. positive or negative. Example 2: for the address 26 PLEASANT VALLEY RD, the default Positive Keys would be:
VALLEY + 26 (+ PLEASANT) PLEASANT + 26 (+ VALLEY) 26 + PLEASANT (+ VALLEY)
(The words in brackets will be used if the preceding words are common.) A search for 26 PLEASANT RD would find the address 26 PLEASANT VALLEY RD on the second Positive key.
The default bias for Positive Key word order is to cater for the ’double-barreled surname’ problem or the ’two word street name’ problem.
Positive keys may not be good enough for Company names or other long name structures – Negative keys are normally recommended for these.

Negative Keys

To build negative keys, set
SSA-NAME3-OPTIONS #18
to
Y
or minus sign (-) or set
KEYS=NEG
. Negative keys also cater for the fact that some search names may not be in the same order as the file name, or may have extra or missing words.
The difference between Positive Keys and Negative Keys is that Negative Keys do not bias a specific word order problem – they cater for all word orders.
For example, the search name DAVIS SMITH would not find the file name MARY DAVIS SMITH if only Positive Keys were used, until the search was widened to the one word level.
Negative Keys generate one key for each pair of non-delete words in the name followed by the other words in a left to right order. In addition, two ’concatenated word’ keys are built, one where the first and second words are concatenated, and one where the second last and last words are concatenated. The behavior of this concatenated-word key building is affected by the setting of
SSA-NAME3-OPTIONS #10
and
#21
. The number of words used to build each key depends on the commonality of all the words. Negative Keys are built (using a three word example) with the following word orders:
First Word + Second Word (+ Major Word) SecondWord + First Word (+ Major Word) Major Word + First Word (+ Second Word) First Word + Major Word (+ Second Word) SecondWord + Major Word (+ First Word) Major Word + Second Word (+ First Word) First WordSecond Word (+ Major Word) SecondWordMajor Word + First Word
Negative keys are an extension of Positive Keys. The last two keys are built by concatenating the first two words and the last two words from the name – this gives extra chances to find a match when the search name words are concatenated. It is possible that the keys generated from these concatenations will sometimes be the same as keys already generated in the stack. In this case, the concatenated word keys will not be added to the stack.
Example 1: for the name MARY DAVIS SMITH, the Negative Keys would be: SMITH + MARY (+ DAVIS)
SMITH + DAVIS (+ MARY) DAVIS + MARY (+ SMITH) DAVIS + SMITH (+ MARY) MARY + DAVIS (+ SMITH) MARY + SMITH (+ DAVIS) MARYDAVIS + SMITH DAVISSMITH + MARY
(The words in brackets will be used if the preceding words are common.)
A search for DAVIS SMITH would find the name MARY DAVIS SMITH on either the 2nd or 4th Negative keys, depending on the search strategy used, i.e. positive or negative, without needing to widen the search to the one word level. Example 2: for the name FOREMOST COMPUTER SUPPLIES, the Negative Keys would be:
FOREMOST + COMPUTER (+ SUPPLIES) FOREMOST + SUPPLIES (+ COMPUTER) COMPUTER + FOREMOST (+ SUPPLIES) COMPUTER + SUPPLIES (+ FOREMOST) SUPPLIES + FOREMOST (+ COMPUTER) SUPPLIES + COMPUTER (+ FOREMOST) FOREMOSTCOMPUTER + SUPPLIES FOREMOST + COMPUTERSUPPLIES
A search for FOREMOST SUPPLIES would find the name FOREMOST COMPUTER SUPPLIES on either the 2nd or 5th Negative keys, depending on the search strategy used, i.e. positive or negative. Using a Negative key strategy can potentially generate lots of keys. To be sure that records are not missed this is necessary. However, because in some cases disk space is a real concern and because in some cases a key may map a large number of records (in the above example, COMPUTER + SUPPLIES), an option exists to prevent keys being built when the first word to be used in the key is a skip word. (Skip words are defined in the Edit-list. Good examples of Company skip words might be COMPUTER, SUPPLIES, TRADING, ENGINEERING, etc.). This option is also available for Positive keys. To turn on this option, set the following parameter:
SSA-NAME3-OPTIONS #24
to ’
Y
’ or ’
S
’ Doing this for the above example would cause only the following Negative keys to be generated:
FOREMOST + COMPUTER (+ SUPPLIES) FOREMOST + SUPPLIES (+ COMPUTER) FOREMOSTCOMPUTER + SUPPLIES FOREMOST + COMPUTERSUPPLIES
It is important to note that a trade-off exists when using the above option. This is, if a name contains all skip words, for example, COMPUTER TRADING SUPPLIES , then a Preferred Key will still be built for it. In this example, if
NAME-FORMAT=L
, that key would be COMPUTER + TRADING (+ SUPPLIES). This means that any search for COMPUTER TRADING SUPPLIES will find this name, but will not find a name that had those words in it as well a non-skip word, e.g. it would not find FOREMOST COMPUTER TRADING SUPPLIES (because only keys beginning with FOREMOST would have been generated).

Building Keys from Full Addresses

When a full address is to be passed to NAMESET for key building, as opposed to just passing the part preceding the state/locality, Positive keys will also be built for the non-delete words in the locality, state and post/zip code portion of the address. In the case of Negative keys, keys will be built for the non-delete, and optionally the non-skip, words. Some such keys will not be all that useful (e.g. those built with the State Code as the first word). If you are passing a full address, it is therefore advisable to ensure that your Street Edit-list has delete rules in for the common locality type words, or if using Negative Keys, that such words are defined as Skip in the Street Edit-list and the Ignore Skips Algorithm option is selected. Alternatively, you can tell NAMESET to delete everything after a certain marker. For example, by defining the following in the Street Edit-list:
*S >RD<DABA *W ><
Then everything after the word RD would be deleted from the address before key building.
When using the ’delete after’ rules, it is not possible to also use those words as Street Major Markers, as they are themselves deleted from the address. Therefore, it is more appropriate to use
NAME-FORMAT=R
when using this method.

Multi-Valued Fields

If your data contains multiple valued name fields, a feature of NAMESET called the Multi-Valued Field processor (MVF) can assist with the creation of better keys and search-tables and for the achievement of more reliable Matching. Multi-valued fields are of the following types:
  • Account names
  • Compound names
  • Alias or former names
  • Secondary Phrases
The order of MVF processing is:
  • Compound name markers identified.
  • Each part identified starting from the right.
  • For each part account name processing is done.
  • For each account name secondary phrase processing is done.

Account Names

If your data contains account names, better control over the construction of keys, the selectivity of searches and the way names are matched can be achieved by setting some customization parameters. Account names are taken to be of the type:
MR. J. AND MRS. M. SMITH JOHN AND MARY DAVIS T & M & J DUBOIS P.M. & L.V. CHAN
For MVF to be able to recognize account names, the names must contain an account name ’marker’. In the above examples, the account name markers are AND and &. To customize SSA-NAME3 for Account Names, do the following: The Account Name Marker(s) must be defined in the Edit-list of the Algorithm being used, e.g.:
*S >&<BA *W >AND< *A AND
The Account Name feature must be switched on in the Algorithm by setting
SSA-NAME3-OPTIONS #25
to ’
Y
’ or ’
A
’.
Account Name Pattern rules must be defined in the Algorithm Definition. Account Name Pattern rules are of the form:
PATTERN=W+WW RULE=1,3 RULE=2,3
(The + sign in the PATTERN is the position of the Account Name marker in the name). The particular rule example above covers the case of:
JOHN AND MARY DAVIS (i.e. PATTERN=W+WW)
The two rules defined for that pattern tell MVF (which is used by the Key/ Search-table Building and Matching services) to construct two separate names before building keys, search-tables, or matching, each. The two constructed names in this example will be:
JOHN DAVIS (i.e. RULE=1,3... ..word 1 + word 3) MARY DAVIS (i.e. RULE=2,3... ..word 2 + word 3)
To help determine what Account Name Patterns you have in your data, the SSA-NAME3 Word Frequency Report utility can analyze your data and print a report of the most common patterns. See the Utilities section in the
GENERATION and TESTING GUIDE FOR SSA-NAME3 SERVICE GROUPS
for more details.

Compound Names

If your data contains compound names, better control over the construction of keys, the selectivity of searches and the way names are matched can be achieved by setting some customization parameters Compound names are taken to be of the type:
JOHNATHON TAN IN TRUST FOR JIMMY TAN MARY GONZALES WITH L THOMAS J SIMS & SONS LTD TRADING AS ABLE TIMBER THE ABC GROUP DBA NETWORKS AMERICA
For SSA-NAME3 to handle compound names, the compound names must contain a compound name ’marker’. In the above examples, the compound name markers are
IN TRUST FOR
,
WITH
,
TRADING AS & DBA
(doing business as). To customize SSA-NAME3 for Compound Names, do the following: The Compound Name Marker(s) must be defined in the Edit-list in the format:
*M IN TRUST FOR *M WITH *M TRADING AS *M DBA
The Compound Name feature must be turned on by setting
SSA-NAME3-OPTIONS #2
to ’
Y
’ or ’
C
’.
The combination of the Compound Name processor and the Compound Name Markers tells MVF to construct two separate names before building keys, search-tables, or matching each. For example:
MARY GONZALES WITH L THOMAS
Will be split into the two names before key-building, search-table building or matching:
MARY GONZALES L THOMAS

Alias and Former Names

If the data contains alias names, or former names, better reliability may be gained by passing all names belonging to the one entity (e.g. person, company or address) to NAMESET orMATCH at one time. For NAMESET to build keys or search-ranges for alias and former names, the
REPEAT=
Function keyword must be specified at run-time. This tells NAMESET how many fixed-length names to expect. For MATCH to do comparisons on alias and former names, the
REPEAT=
keyword must be specified in the Matching Scheme definition. This tells MATCH how many fixed-length names to expect.

Key and Search-table Building for Multi-Valued Fields

Multi-valued field names will be treated as separate names for key building, However, the keys are not differentiated in the Keys-stack as to what name they were generated from. That is, all keys are attributed to the logical entity (e.g. person, company or address) and not to any specific name occurrence of that entity. The multi-valued names will also be treated as separate names for search-table building.
The search-ranges for each name will be accumulated in the search-table. All of the search-ranges for the first name will appear in the search-table before all of the search-ranges for the second name, etc. The search-table
sequence
field identifies which name a search-range was generated from – this field is incremented for each name. Remember when using multi-valued fields for key-building to ensure that the
KEYS-STACK-SIZE
Algorithm parameter is set high enough to hold the maximum number of keys expected, and that there is a limit of 99 keys. When using multi-valued fields for searching, ensure that the
SEARCH-TABLE-SIZE
Algorithm parameter is set high enough to hold the maximum number of search ranges expected, and that there is a limit of 99 search ranges.

0 COMMENTS

We’d like to hear from you!