Table of Contents

Search

  1. Preface
  2. Introduction
  3. Definition File Overview
  4. Customization Steps
  5. Service Group Definition
  6. Algorithm Definition
  7. Edit-list Definition
  8. Matching Scheme Definition

Service Group Definition and Customization Guide

Service Group Definition and Customization Guide

Module Options

Module Options

Module options are a group of characters that control most of the modules. According to the position of the characters, the options control a module’s functionality.
Use the following syntax to define the module options:
[Module Name]-OPTIONS=[Options]
A module options definition uses the following parameters:
Module Name
Name of the module for which you define the options.
Options
List of characters that control the cleaning module.
The following excerpt is an example for the module options:
* ....+....1....+....2....+....3 FORMATTING-OPTIONS=WCYNNNNNNNN
In the preceding example, the first line is a ruler to help you align each option’s position. You have set formatting option 2 to C, which marks the suspect codes as codes, option 3 to
Y
, which concatenates the adjacent codes, and other positions to
N
.
You can also use the following format:
FORMATTING-OPTIONS=WCY
If you want to set all the remaining values to N, you can use spaces. However, you cannot use a space in the beginning or middle of a list of options.

Cleaning Options

Use the following syntax for the cleaning options:
CLEANING-OPTIONS=<Options>
You can use the following options to control the cleaning module:
1
Activates the Y rule. The Y rule removes a single Y character when it is present between the words, especially in the Spanish-style names.
For example, consider the name
Jose Domingo Y Ramirez
. Use the cleaning option 1 to detect the Y character and remove it.
However, use this rule with caution because it is not always appropriate to remove the Y character. For example, consider the French name,
Michelle Y Bousselain
, where Y stands for
Yvonne
. In this case, removing Y from the name is inappropriate.
You can configure the character that you want to remove instead of the Y character. The character-set table 15 has space for two characters, and the Y rule refers to this table to identify the characters that you want to remove.
The Y rule is similar to pre-cleaning editing, which detects any single character surrounded by delimiters. However, the pre-cleaning editing cannot detect a pattern surrounded by words unless you define a rule for each possible combination.
Use one of the following values:
  • Y. Activates the Y rule.
  • N. Disables this option.
2
Disables the default comma processing. If a name contains a comma and
NAME-FORMAT=R
, the default processing rearranges the name so that the part of the name before comma is moved to the end of the name. For example, consider the name,
Smith, John Edward
. The default processing rearranges the name to
John Edward Smith
. However, it is not always appropriate to perform the default comma processing. For example, this rule might not be applicable for addresses.
Use one of the following values:
  • Y, I. Disables the default comma processing.
  • N. Enables the default comma processing.
5
Converts all the characters to uppercase before the pre-cleaning rules are applied. If you have any accented characters, this option converts the accented characters to non-accented characters before converting the case of the characters. For example, this option converts
Jöhn !@# Smith
to
JOHN !@# SMITH
.
Use one of the following values:
  • Y. Enables this option.
  • N. Disables this option.
29
Enables the break rules that add spaces after a set of characters that you specify.
Use one of the following values:
  • Y. Enables the break rules.
  • N. Disables the break rules.

Formatting Options

Use the following syntax for the formatting options:
FORMATTING-OPTIONS=<Options>
You can use the following options to control the formatting module:
1
Defines the processing when the formatting module finds a code word. A code word can be a single code character such as an initial or any word containing two or more code characters. In general, a code character can be a number from 0 through 9. You can update these values in the internal character-set tables.
Use one of the following values:
  • C, Y. Marks the code as the CODE word type.
  • D. Marks the code as the DELETE word type.
  • J. Marks the code as the MAJOR word type.
  • M. Marks the code as the MAJCODE word type.
  • S. Marks the code as the SKIP word type.
  • T. Marks the code as the SKIPCODE word type.
  • W, N. Marks the code as the SELECT word type.
Options C, Y, M, and T encodes a code in a special way to avoid confusing it for a word. Options W, N, J, and S encodes a code in a way similar to a word.
When you use the option S, if the code word is marked as major, it becomes type J. When you use the option T, if the code word is marked as major, it becomes type M.
If an Edit-list major marker selects a code as major, the formatting option 1 can override that selection.
2
Defines the processing when the formatting module finds a suspect code.
The following words can be a suspect code word:
  • A one-code character except a single character word.
  • Any word with one or more ambiguous characters. In general, you do not define any ambiguous characters. However, you can add ambiguous characters to the internal character-set tables.
  • A one-character word or two-character word that precedes or follows a code word.
Use one of the following values:
  • C. Uses the formatting option 1 to mark the suspect code.
  • D. Marks the suspect code as the DELETE word type.
  • S. Marks the suspect code as the SKIP word type.
  • M. Marks the suspect code as the MAJOR word type.
  • W, N, Y. Marks the suspect code as the SELECT word type.
3
Defines the processing when the formatting module finds adjacent code words.
Use one of the following values:
  • C, Y. Concatenates the adjacent code word and the suspect code word.
  • N. Disables this option.
4
Defines the processing when the formatting module finds duplicate adjacent words.
Use one of the following values:
  • D, Y. Deletes the second word.
  • N. Disables this option.
5
Processes the suspect code word formatting option 2 before concatenating codes and suspect codes.
Use one of the following values:
  • Y. Processes suspect codes according to the formatting option 2 before processing the formatting option 3.
  • N. Disables this option.
6
Defines the processing when the formatting module finds a space during the post-cleaning process.
Use one of the following values:
  • B, Y. Ignores the rest of the word after the space.
  • N. Concatenates both parts of the word.
7
Defines the processing for special street names.
For example, consider
42 nd Street
. This option deletes
nd
and selects
42
. If a major word is two characters long and both characters are alpha, this option deletes the major word and uses the preceding word.
This option also checks whether a major word starts with a digit and ends with two alphas. In that case, this option removes the two alphas. For example, consider
42ND Street
. This option replaces
42ND
with
42.
Use one of the following values:
  • S, Y. Applies special rules to a street name that is a result of an Edit-list Major Left or Major Right marker.
  • N. Disables this option.
9
Defines the processing when the formatting module finds adjacent initials. Two or more initials that do not span across words can be adjacent initials. This option does not control single initials, and the Edit-list processes them.
Use one of the following values:
  • N. Does not concatenate the adjacent initials. The default value is N.
  • I, Y, B, E. Concatenates the adjacent initials into one word and marks the word as the SELECT word type (Y). The Edit-list does not process the concatenated word. If an Edit-rule replaces a word with an initial, these options do not consider it for concatenation.
  • I, Y. Concatenates the adjacent initials irrespective of their position in a word.
  • B. Concatenates the adjacent initials if they are at the beginning of a name.
  • E. Concatenates the adjacent initials if they are at the beginning or in the middle of a name.
  • N, I, Y, B, E. Concatenates the adjacent initials if the stack consists of only initials or initials and skips and marks the concatenated word as a major word. The Edit-list does not process the concatenated word. If any skip words exist, these options retain the skip word in the original order.
  • X. Does not concatenate the adjacent initials even if a name contains only initials or initials and skips.
10
Defines the processing when a code word follows a prefix word.
Use one of the following values:
  • Y. Does not concatenate the words.
  • N. Concatenates both the words unless the code word is a single character.
For example, consider
MR MAC D01
with
MAC
as the prefix word and
D01
as the code word. If you set this option as Y, no concatenation occurs. If you set this option as N, the formatting module concatenates the words as
MR MACD01
.
11
Controls the selection of the major word in the Major Left or Major Right processing.
Use one of the following values:
  • Y. Ignores skip words while selecting the major word.
  • N. Includes skip words while selecting the major word.
  • C. Ignores skip words and codes while selecting the major word.
For example, define
Plaza
and
Square
as type X or type Y words in the Edit-list, and consider the following address:
100 Times Plaza Square
If you set this option as Y, the formatting module skips
Plaza
and selects
Times
as the major. If you set this option as N, the formatting module selects
Plaza
as the major word.
12
Determines the final word type of a word that you define in the Edit-list as a prefix or postfix join or a prefix or postfix split and ends up on its own in the words stack.
For example:
  • Prefix-join word not followed by another word in the name
  • Postfix-join word not preceded by another word in the name
  • Prefix-split word
  • Postfix-split word
  • Formatting loop caused by conflicting prefix or postfix rules
Use one of the following values:
  • N. Marks the word as the SKIP word type (S).
  • Y. Marks the word as the SELECT word type (Y).
13
Controls the behavior of postfix rules.
Use one of the following values:
  • Y. Does not delete a standalone word or an initial if it is defined as a postfix delete.
  • N. Deletes a standalone word or an initial if it is defined as a postfix delete. The default value is N.
14
Controls the behavior of prefix split rules.
Use one of the following values:
  • Y. Skips the prefix rules after a prefix split.
  • S. Skips the prefix split rules after a prefix split.
  • N. Disables this option. The default value is N.
15
Controls the insertion of words into the word stack after a prefix or postfix split.
Use one of the following values:
  • Y. Checks the word stack for a duplicate entry and does not add the duplicate words caused by a replacement or split loop to the word stack.
  • N. Disables this option. The default value is N.
16
Controls prefix or postfix split when attached to an initial.
Use one of the following values:
  • Y. Does not perform a prefix or postfix split if you only have an initial.
  • N. Disables this option. The default value is N.
17
Controls word stack category name changes in the word stack.
Use one of the following values:
  • Y. Keeps the previous category name in the word stack entry if the last rule was of the no stabilization category type.
  • N. Disables this option. The default value is N.
18
Checks for a leading code in a string and splits the code from the string based on the value that you set in the formatting option 20.
For example, if you set formatting option 18 to Y and formatting option 19 to 2, the formatting module splits the string, 14CROSS, into 14 and CROSS. If the string is 4CROSS, the formatting module does not split the string.
Use one of the following values:
  • Y. Splits the code from the string based on the formatting option 19.
  • N. Disables this option. The default value is N.
19
If you set the formatting option 18 to Y, controls the number of characters to split from the beginning of a string.
Use one of the following values:
  • Numbers 1 through 9. Splits the specified number of characters from the beginning of a string.
  • N. Disables this option. The default value is N.
20
Checks for a SPLITCODE and marks it as a MAJCODE.
Use one of the following values:
  • Y. Marks a SPLITCODE as a MAJCODE.
  • N. Disables this option. The default value is N.
21
Checks for extension in a telephone number and removes the extension from the telephone number if the extension uses one of the predefined formats such as ext, extn, x, X, or #. When you use this option, specify
FORMATTING=N3FTTN
.
Use one of the following values:
  • Y. Removes the extensions from the telephone numbers.
  • N. Disables this option. The default value is N.
22
Checks for a leading country code in a telephone number. If the leading code matches the CODEPREFIX keyword value and if the telephone number length is greater than the MAXCODELEN keyword value, this option removes the leading code from the telephone number. When you use this option, specify
FORMATTING=N3FTTN
.
Use one of the following values:
  • Y. Removes the leading country code from the telephone numbers.
  • N. Disables this option. The default value is N.
23
Checks for a leading prefix in any code. If the code length is greater than the MAXCODELEN keyword value, this option removes the specified number of characters from the beginning of the code. When you use this option, specify
FORMATTING=N3FTTN
.
Use one of the following values:
  • Numbers 1 through 9. Removes the specified number of characters from the beginning of a code.
  • N. Disables this option. The default value is N.
24
Checks for a leading postfix in a code to split. If the code length is greater than the MAXCODELEN keyword value, this option splits the specified number of characters from the end of a code.
For example, you can split 9900502724/25 as 9900502724 and 9900502725. When you use this option, specify
FORMATTING=N3FTTN
.
Use one of the following values:
  • Numbers 1 through 9. Splits the specified number of characters from the end of a code.
  • N. Disables this option. The default value is N.
25
Checks whether a name contains a single or multiple words. When you list the noise words, this option indicates when to delete the noise words.
Use one of the following values:
  • Y. Deletes the noise word irrespective of the number of words in a name.
  • N. Does not delete the noise word if a name contains a single word. Default is N.
26
Defines how to process the names after deleting the noise words.
Use one of the following values:
  • Y. Adjusts the position of the start and end words.
  • N. Does not adjust the position of the start and end words. Default is N.

NAMESET Key-Building Options

Use the following syntax for the SSA-NAME3 options:
SSA-NAME3-OPTIONS=<Options>
You can use the following options to control the NAMESET module:
2
Enables the compound name feature.
Use one of the following values:
  • C, Y. Checks whether the name is compound. A compound name contains multiple separate names joined by connecting phrases. If the NAMESET module finds a compound name, the module separately processes each component. You must define at least one compound name marker in the associated Edit-list.
  • N. Disables this option.
5
Includes initials as common names in the frequency table. In general, the NAMESET module does not treat initials as common names. However, you can include initials as common names. Use this option if the name populations have initials instead of the full given names.
Use one of the following values:
  • I, Y. Includes initials as common names in the frequency table.
  • N. Does not include initials in the frequency table.
6
Builds special code key and search ranges. For a code word, the NAMESET module generates a key, which is specific to the code word and places a probe for the word in the search table.
Use one of the following values:
  • C, Y, numbers 1 through 9. Generates a key, which is specific to the code word and places a probe for the word in the search table.
    If you specify C or Y, the rule applies to codes that have four or more digits. You can specify a different minimum length by using any number from 1 through 9.
    Ensure that you set the formatting option 1 to C, M, or T.
    The following excerpt is a sample output when you set the option to C for the input,
    UNIT 1234
    :
    STACK: 02 1 UNIT M Y UNAT 2 1234 C BCGJ KEYS: 03 1 22190C0002 I1 2 F9A7432848 A1 3 FFC48EF276 A1 STAB: F9A7432848 1 22190C0000 22190C0003 05 00 10 I1 S 00 2 F9A7432848 F9A743284B 60 00 20 A1 C 01 3 F9A7432800 F9A7432BFF 68 00 12 A2 C 01 4 F9A7400000 F9A743FFFF 75 00 10 A4 C 01 5 F9A4000000 F9A7FFFFFF 90 00 03 A6 C 01 6 F980000000 F9BFFFFFFF 93 00 02 A7 C 01 7 0000000000 FFFFFFFFFF 00 30 00 A9 C 01
  • N. Does not build special code key and search ranges.
    The following excerpt is a sample output when you set the option to N for the input,
    UNIT 1234
    :
    STACK: 02 1 UNIT M Y UNAT 2 1234 C BCGJ KEYS: 02 1 F9A7432848 A1 2 FFC48EF276 A1 STAB: F9A7432848 1 F9A7432848 F9A743284B 60 00 20 A1 C 00 2 F9A7432800 F9A7432BFF 68 00 12 A2 C 00 3 F9A7400000 F9A743FFFF 75 00 10 A4 C 00 4 F9A4000000 F9A7FFFFFF 90 00 03 A6 C 00 5 F980000000 F9BFFFFFFF 93 00 02 A7 C 00 6 0000000000 FFFFFFFFFF 00 30 00 A9 C 00
You can find an extra key and a probe in the preceding sample output when you set the option to C.
10
Builds an extra concatenated negative key even if the name contains only two words.
Use one of the following values:
  • C, Y. If the name consists of only two words, builds an extra negative key to concatenate both the words. Normal negative key building builds concatenated word keys only if the name has more than two words. When you set this option, you must also set SSA-NAME3 option 18 to – or Y and option 21 to N. For negative searches, ensure that you specify the
    NEG
    function keyword.
  • 2. If the name consists of only two words, builds an extra negative key to concatenate both the words. Normal negative key building builds concatenated word keys only if the name has more than two words. When you set this option, you must also set SSA-NAME3 option 18 to – or Y. For negative searches, ensure that you specify the
    NEG
    function keyword.
    Setting this option to C, Y, or 2 can lead to poorer selectivity for two-word names that contain common words. The poorer selectivity occurs because when you concatenate two words, they might become an uncommon word and a candidate to the uncommon name key design.
  • N . Disables this option.
14
Allows initials to be selected as the major part of keys during negative key building and negative search table building.
Use one of the following values:
  • I, Y. Allows initials to be selected as the major part of keys in negative strategies.
  • N. Disables this option.
17
Controls the selection of the second minor word in positive or negative key building.
Use one of the following values:
  • N. Selects the second minor at the head of the name for positive keys and to the right of the first minor for negative keys. The default value is N.
  • P, Y. Selects the second minor at the head of the name. Uses positive rules to build negative keys.
  • R. Selects the second minor word to the right of the first minor. Uses negative rules to build positive keys.
  • L. Selects the second minor word to the left of the first minor.
18
Specifies whether you want to use positive or negative rules to generate keys.
Use one of the following values:
  • –, Y. Uses negative rules to build the keys. If you want to generate negative keys, specify this option and set
    ALTERNATE-KEYS
    to
    Y
    .
  • N. Uses positive rules to build the keys. If you want to generate positive keys, specify this option and set
    ALTERNATE-KEYS
    to
    Y
    .
19
Defines the first minor position and the direction rule for positive keys only. Specifies which word in the name to look for the first minor word and in which direction to look if this word does not qualify as a minor. The decision whether a word qualifies as a minor depends on SSA-NAME3 option 20.
Use one of the following values:
  • H. Starts with the leftmost word, which is the head of the name and searches to the right.
  • T. Starts with the rightmost word, which is the tail of the name and searches to the left.
  • L. Starts with the word to the left of the major word and searches to the left.
  • R. Starts with the word to the right of the major word and searches to the right.
  • N. Starts with the leftmost word, which is the head of the name and searches to the right. Ignores skip type words. This option is equivalent to specifying SSA-NAME3 option 19 to H and SSA-NAME3 option 20 to S.
20
Defines the first minor selection rule for the positive keys.
Use one of the following values:
  • N. Always accepts the word.
  • S. Ignores the skip type words. If a word is a skip type, this option does not select it as the first minor and moves to the next word.
  • C. Accepts only the code type words including SKIPCODEs as minors.
21
Does not build concatenated-word keys for negative keys.
Default Negative key processing will build extra keys for the concatenation of words in the name. To see how these concatenated-word keys are built, refer to the
Factors which Determine the Format of a Name / Negative Keys
section.
Use one of the following values:
  • Y. Turns off concatenated-word key building and disables SSA-NAME3 options 10 and 28.
  • N. Builds concatenated-word keys. Valid only if you set SSA-NAME3 option 18 set to – or Y to build negative keys. The default value is N.
22
Sets the word order for the Customset rules.
Use one of the following values:
  • Y. Indicates that W1 is the first word, W2 is the second word, and so on.
  • N.
23
Generates 8-byte character keys.
Use one of the following values:
  • 8. Causes NAMESET to generate an 8-byte character based key.
  • N. Generates 5-byte binary keys.
24
Normal processing during Positive & Negative key building permits the use of a skip word as a major. This may be undesirable in some cases.
Use one of the following values:
  • Y, S. Do not use skip words as a major.
  • N - Allow a skip word as a major.
If this option is set to disallow skip words as a major word in a key, when a name contains ALL skip words, a single key is still built. This key is built in the preferred key order, that is, one of the skip words is chosen as the major word (usually depending on the
NAME-FORMAT=
Algorithm option).
Also, setting this option on has certain advantages and disadvantages which the user should be aware of.
The main advantage is that the number of keys which need to be stored can be reduced, often significantly, but in line with the number of skip words defined in the Edit-list. For example, in the case of the Fast-start Company name Edit-lists, there are usually many skip words defined (example, manufacturing, engineering, etc.).
The disadvantages are as follows. A search on a name which contains all skip words, when those skip words are used in a different sequence than was used to build the keys for the name, will not find the record. For example, if both PRODUCTS and MANUFACTURING are Skip words, then a search on,
PRODUCTS MANUFACTURING CO.
will not find
MANUFACTURING PRODUCTS CO.
Also, a search on a name which contains all skip words will not find matches where those skip words appear in combination with another non-skip word.
For example,
PRODUCTS MANUFACTURING CO.
will not find
ABC PRODUCTS MANUFACTURING CO.
(Some users may see this as an advantage).
25
Enables the Account Name feature.
Use one of the following values:
  • A, Y. When selected, the Account Name processing component of the NAMESET service is enabled. To be useful, requires at least one Account Name Marker to be defined in the associated Edit-list, and a corresponding Account name pattern to be defined in the Algorithm definition. Refer to the
    Account Rules Definition and Multi-Valued Fields
    sections for more information.
  • N. Even if Account Name Markers are defined in the Edit-list, Account Name processing will not take place.
26
Normal generation of Customset search ranges for a name will generate a range even if no pattern was defined for that name in the Algorithm, by using the next shortest pattern which matches that name.
Use one of the following values:
  • Y, E. Only if an exact match is found for the name pattern will the Customset ranges be generated.
  • N. If an exact match is not found for the name pattern, use the next shortest pattern which matches that name.
For more information on defining Customset patterns, refer to the
Customset
section.
27
Builds additional key(s) if the initial of the first minor word changes after Formatting or Stabilization.
Use one of the following values:
  • I, Y. An extra key is generated if the original initial of the first minor word of the preferred key is different after Formatting (example, BILL => WILLIAM. This also applies if the initial changed after Stabilization, (example, PHILLIP => FALAP, meaning up to two extra keys may be generated. The extra key(s) will be composed of the Major Word + Original Initial. The extra keys are generated if
    ALTERNATE-KEYS=Y
    or
    N
    .
  • N. No extra keys are generated if the first minor word initial changes.
28
Enables concatenated-word key processing for initial+word combinations (Negative keys only). To learn more about concatenated-word key processing, refer to the
Factors which Determine the Format of a Name / Negative Keys
section.
Use one of the following values:
  • I, Y, P. Normal concatenated-word key processing will concatenate words only. This option allows the processing to be applied to an initial preceding a word combination. For example,
    J SMITH
    will get a
    JSMITH
    key.
    To achieve equivalent functionality at search time, see the NAMESET Function keyword
    ICONCATP
    , in the
    NAMESET Function Keywords
    section.
  • N. Does not apply concatenated-word processing to either initial+word or word+initial combinations (default).
  • A. Applies concatenated-word processing to word+initial combinations. For example,
    SMITH J
    will get a
    SMITHJ
    key.
    To achieve equivalent functionality at search time, see the NAMESET Function keyword
    ICONCATA
    , in the
    NAMESET Function Keywords
    section.
  • B. Applies concatenated-word processing to both initial+word and word+initial combinations.
    To achieve equivalent functionality at search time, see the NAMESET Function keyword
    ICONCATB
    , in the
    NAMESET Function Keywords
    section.
29
Enables INITKEYA.
Use A or Y to enable INITKEYA.

0 COMMENTS

We’d like to hear from you!