Identity Resolution
- Identity Resolution 10.5
- All Products
NAMESET Key Building Keywords
| |
---|---|
Keyword
| Description
|
ADDALLWORDPROBES
| Builds one-word probes for each word in a stack.
|
BINCOUNT=
| This keyword requests that the Keys-Count value (first 2 bytes of the Keys-Stack) be a binary value. Needs to be used when the Keys-Stack contains more than 99 keys.
|
BUILDPROBEDROPFIRSTINIT
| Builds an additional probe for the words with a leading initial followed by at least two non-initial words. For example, the names
MR M Dan Houting and
Dan Houting match with the additional probe.
|
CONCATKEYS
| The
CONCATKEYS keyword is used to turn on the building of concatenated-word keys when there are only two words in the name. It has the effect, at run-time, of overriding and setting
SSA-NAME3-OPTIONS #10 to ’Y’.
-CONCATKEYS will disable the building of two-word concatenated-word keys.
|
CONCATPROBES1
| Concatenates all the words on the left of the major word.
For example, the email address
joan.angela.smith@example.com concatenates to become
JOANANGELASMITH, EXAMPLE .
|
CONCATPROBES2
| Creates two probes by performing the following tasks:
|
CONCATPROBES3
| Concatenates the initials of all the words on the left of the major word.
For example, the email address
joan.angela.smith@example.com concatenates to become:
|
CONCATPROBESALL
| Concatenates all the words from the left and right of the major word.
For example, the email address
joan.angela.smith@example.places.com creates probes such as:
|
CONCATPROBESLM
| Builds an additional key or range based on the concatenated words on the left of the major word. Applicable only when you specify the CONCATPROBES1 keyword.
For example, the email address
joan.angela.smith@example.com becomes
JOANANGELASMITH .
|
CONCATPROBESRM
| Builds an additional key or range based on the concatenated words on the right of the major word. Applicable only when you specify the CONCATPROBES2 keyword.
For example, the email address
joan.angela.smith@example.places.com becomes
EXAMPLEPLACES .
|
CONCATPROBESUSELEFT
CONCATPROBESUSERIGHT
| Indicates whether the major word is on the right or left, and generates probes accordingly.
For example, the email address
joan.angela.smith@example.places.com becomes:
|
CONCATRANGES
| The
CONCATRANGES keyword is used to turn the building of concatenated-word keys on. It has the effect, at run-time, of overriding and setting
SSA-NAME3-OPTIONS #10 to
Y (that is build a concatenated-word key for 2-word names), and setting
SSA-NAME3-OPTIONS #21 to
N (that is build two ’concatenated word’ keys for > 2-word names, one where the first and second words are concatenated, and one where the second last and last words are concatenated).
-CONCATRANGES will disable the building of concatenated-word keys.
|
CONCMAJORPROBES
| Concatenates two words to create the major word and builds another probe by converting the second minor to an initial.
For example, the name
BELLA ROSE BINDOO ELLIOTT , creates the following two probes:
|
CONCMINORKEY
| This keyword causes a key to be built by concatenating the first two minors. Example, for the name
JONG KU LEE , where
LEE is the major, specifying this keyword will build an additional key for
JONGKU LEE .
|
EXTRAWORDPROBE
| Creates a probe by eliminating the rightmost word when the name has at least three words.
For example, the name
JASON A PEREZ MEDINA creates the probe
JASON A PEREZ .
|
EXTRAPREFIXKEY
| Enables extra prefix split rules and generates additional keys by splitting or concatenating the prefix from the word.
Consider the following sample Edit-list definition that contains extra prefix rules:
The
EXTRAPREFIXKEY keyword generates extra keys in one of the following ways:
In this example,
Q indicates category type, and the extra prefix rules are applied for the prefix
DA in the word
DA SILVA or
DASILVA .
The extra prefix rules are applied after the word stack is built. Existing prefix rules are applied first, and then the extra prefix rules are applied to the split or concatenated word.
|
EXTRAPREFIXRANGE
| Enables extra prefix split rules and generates additional ranges by splitting or concatenating the prefix from the word.
Consider the following sample Edit-list definition that contains extra prefix rules:
The
EXTRAPREFIXRANGE keyword generates extra ranges in one of the following ways:
In this example,
Q indicates category type, and the extra prefix rules are applied for the prefix
DA in the word
DA SILVA or
DASILVA .
The extra prefix rules are applied after the word stack is built. Existing prefix rules are applied first, and then the extra prefix rules are applied to the split or concatenated word.
|
EXTRAPREFIXSPLIT
| Generates additional probes or keys after splitting the prefix according to the prefix split rules in the Edit-list. For example, the word
LOU DASILVA is split to
LOU DA SILVA , and the
EXTRAPREFIXSPLIT keyword generates additional keys or probes using
LOU DA SILVA . Ensure that you use the
EXTRAPREFIXKEY or
EXTRAPREFIXRANGE keyword if you want to use the
EXTRAPREFIXSPLIT keyword.
|
EXTRAPREFIXPROBE
| Generates additional probes or keys after deleting the prefix. For example, the prefix
DA is deleted in the word
LOU DA SILVA , and the
EXTRAPREFIXPROBE keyword generates additional keys or probes using
LOU SILVA and
LOU DASILVA . The
EXTRAPREFIXRANGE keyword is a prerequisite for the
EXTRAPREFIXPROBE keyword.
|
EXTRAPREFIXLENGTH=
| Specifies the length for the extra prefix word. The default length is 3. Ensure that you use the
EXTRAPREFIXKEY or
EXTRAPREFIXRANGE keyword to specify the prefix length using the
EXTRAPREFIXLENGTH= keyword.
|
FIRSTMINORPROBE
| Builds an additional probe for the first minor word. For example, if the name is
PEPE JONES , the
FIRSTMINORPROBE keyword builds an additional probe for the first minor word,
PEPE . |
FIRSTWORDRANGEORKEY
| Builds an additional key if the first word in the word stack is a skip word and the
NAMEFORMAT keyword is set to
L .
|
ICONCAT
| Same as
ICONCATP .
|
ICONCATA
| Allows concatenated-word key-building to be applied to word+initial combinations. To learn more about concatenated word processing, see the Factors which Determine the
Format of a Name / Negative Keys section and
SSA-NAME3-OPTION #28
.
|
ICONCATB
| Allows concatenated-word key-building to be applied to both word+initial and initial+word combinations. To learn more about concatenated-word processing, refer to the
Factors which Determine the Format of a Name / Negative Keys
section. Also see
SSA-NAME3-OPTION #28
.
|
ICONCATP
| Normal concatenated-word key-building will concatenate words only. This keyword allows the processing to be applied to initial+ word combinations. To learn more about concatenated word processing, refer to the section Factors which Determine the Format of a Name / Negative Keys. Also see SSA-NAME3-OPTION #28.
|
IKNOSKIPS
| Specifying this keyword will cause Skip words to be ignored by
INITKEYA processing.
|
INITKEYA
| Builds an additional key on a word that is an acronym of the name being processed. There must be three or more words before this key will be built and a maximum of eight words will be used. By default, Skip Words will be used; however, Skip Words can be ignored by using the
IKNOSKIPS keyword in addition to
INITKEYA . For example, the name
THE ATLANTIC AND PACIFIC TEA COMPANY will generate a key for
APT .
|
INITKEYS1
| SSA-NAME3-OPTIONS #27 , when set, builds an extra key if the initial of the first minor word changes after Formatting or Stabilization. This option extends that functionality to build an extra key if an initial has changed for a word that is not the first minor. For example, in the case of
HELEN KATHERINE SMITH , an extra key is built for
HELEN C SMITH .
|
INITKEYS2
| SSA-NAME3-OPTIONS #27 , when set, builds an extra key if the initial of the first minor word changes after Formatting or Stabilization. This option extends that functionality to build an extra key if an initial has changed for any word and uses that word as the first minor. For example, in the case of
HELEN KATHERINE SMITH , an extra key is built for
C HELEN SMITH .
|
KEYS=
| Overrides the
ALTERNATE-KEYS and
SSA-NAME3-OPTIONS= #18 options at run time.
You can use the following values:
For more information on keys, see the
Factors which Determine the Number of Keys to Generate per Name section.
|
KEYSIZE=
| This allows the
KEYS-STACK-SIZE algorithm value to be dynamically set at runtime. Values can be from 1 to 65536 but if greater than 99 then
BINCOUNT must also be used.
|
LENGTH=
| This keyword overrides, at run-time, the
NAME-LENGTH= option in the Algorithm. Values between 10 and 255 can be used. It is used to specify the length of the names passed to SSA-NAME3.
|
NAMEFORMAT=
| This keyword overrides, at run-time, the
NAME-FORMAT= option in the Algorithm. Valid values are
R or
L . It is used to specify if your names have the major word (example, surname) on the right (at the end) or on the left (at the beginning).
Note: You should keep the value of the
NAMEFORMAT= keyword the same for both key-building and searching. For more information on Name Format, refer to the
Factors which Determine the Format of a Name
section.
|
NM3KEYSIZE=
| Specifies the key length for code fields in standard populations. The default value is 12, which extends the key length to 30 bytes for keys that the code fields generate. Use the
NM3KEYSIZE=12 and
CODEKEYSEXT parameters to extend the key length to 30 bytes for the keys generated by the following code fields:
Key Fields section in the
Standard Populations chapter.
|
NOSTAB
| Do not generate a Search-table. It is sometimes desirable to perform a
NAMESET call but not generate a Search-table. For example when loading keys into a database you will not be searching for records so you don’t need the Search-table. Using this option will disable the generation of a Search-table which will increase the efficiency of the call. Note that the Search-table will still contain the terminating range.
|
NOUNCOMMONVOWELS
| Retains vowels in the uncommon words during the key building process. For example, the name GHALIB stabilizes to GALAB. By default, during the key building process, the vowels in the uncommon words are ignored.
Use the
NOUNCOMMONVOWELS keyword only if you require the vowels to differentiate the uncommon words.
|
ORIGWORDKEY
| Use the
ORIGWORDKEY keyword to turn on the building of an additional key based on the unformatted words after Cleaning and Stabilization, but without Edit-list processing. This assists in retrieving a candidate record that might otherwise be missed because either the search or file name failed to activate an edit list rule due to a slight misspelling.
|
RETORIGACCN
| Generates keys for the original name when a name matches an Account Name pattern.
For example, the name
JOHN, MARY AND COMPANY generates keys for the following words:
|
SINGLESKIPPROBE
| Builds a probe instead of a range when a word stack has a single word, and the major word was originally a skip word. A range finds a wider set of records, and a probe finds a smaller set of records. If the only word in a word stack is a skip word, SSA-NAME3 builds a probe instead of a range.
If a word stack has a single word, the word becomes a major word. When you enable the
SINGLESKIPPROBE keyword, the
NAMESET function identifies whether the single word was a skip word before the word became a major word.
For example,
BOARD was a single skip word that became a major word. Without the
SINGLESKIPPROBE keyword, SSA-NAME3 builds a wide range from 8400000000 to 843FFFFFFF. With the
SINGLESKIPPROBE keyword, SSA-NAME3 builds a narrow range or probe from 8400000000 to 8400000003.
|
SINGLEWORDPROBE
| Converts a one-word range to a probe.
For example, the name
John and Mary Smith generates the probe
JOHN .
|
SKIPALLSKIPS
| Enables all word skip options for keys and ranges. SSA-NAME3 does not choose the skip type words as major words while building keys or ranges.
|
SKIPKSKIPS
| Enables
SSA-NAME3-OPTION # 24 for keys and ranges. SSA-NAME3 does not choose the skip type words as major words while building keys or ranges.
|
WORDPAIRKEYS
| Builds keys by selecting and using adjacent word pairs. For example, SSA-NAME3 selects word one and word two, word two and word three, and and so on to build keys.
|
WORDPAIRONLY
| Generates word pairs without building the positive and negative ranges or keys.
|
WORDPAIRRANGES
| Builds ranges by selecting and using adjacent word pairs. For example, SSA-NAME3 uses word one and word two, word two and word three, and so on to build ranges. By default, SSA-NAME3 builds ranges using two words.
To include more than two words for ranges, you can use the keywords
STOP= or
WPFULLKEY .
For example,
WORDPAIRARANGES,STOP=WWW builds three word ranges.
WPFULLKEY builds ranges using all the available words.
|
WPEXTRAPAIRS
| Builds extra word pairs by using word pairs that are not adjacent. For example, SSA-NAME3 selects word one and word three, word two and word four, and builds extra word pairs.
Use this option for extended keys and extreme search levels.
|
WPFIRSTLAST
| Builds a word pair by using the first and last words. Builds an extra word pair key or range by using the first and last words in the word stack if you have more than two words.
|
WPFULLKEY
| Builds word pair ranges or probes by using more than two words. By default, SSA-NAME3 builds ranges or probes by using only two words.
|
WPPROBES
| Builds word pair ranges to be probes. By default, SSA-NAME3 builds ranges for word pairs. This keyword converts the ranges to probes.
|
WPSKIPCODES
| Skips codes while building ranges using adjacent word pairs. This option applies to
WPFIRSTLAST ,
WORDPAIRKEYS , and
WORDPAIRRANGES options.
The
WPSKIPCODES option modifies how SSA-NAME3 chooses the word pairs if you use the
WPFIRSTLAST ,
WORDPAIRKEYS , and
WORDPAIRRANGES options. For example, if the address is
10 EDITH ALLEN ST and you specify
WORDPAIRKEYS and
WPSKIPCODES , SSA-NAME3 does not use
10 to build a word pair key.
|
WPSKIPSKIPS
| Does not use the skip words while building ranges using adjacent word pairs. If one of the words is a skip word
S , then SSA-NAME3 does not build a range.
The
WPSKIPSKIPS option modifies how SSA-NAME3 chooses the word pairs if you use the
WPFIRSTLAST ,
WORDPAIRKEYS , and
WORDPAIRRANGES options.
|
General NAMESET Keywords
| |
---|---|
Keyword
| Description
|
DECIPHER
| Given a key, generate a Search-table, without knowing the actual name.
|
ENCODING=
| Specify the codepage for the input data.
Valid values are:
Y Unicode UTF-8 format
8 Unicode UTF-8 format
6 Unicode UTF-16 format
L Unicode UTF-16LE format
B Unicode UTF-16BE format
4 Unicode UCS-4 or UTF-32 format
J Japanese CP932 codepage (Shift-JIS)
S Chinese CP936 codepage (Simplified Chinese)
K Korean CP949 codepage
T Chinese CP950 codepage (Traditional Chinese)
|
FILESIZE=
| The number of records in the file you wish to search. The file size is used by NAMESET when calculating the Scale value of a search range. If a value is not supplied here it will be taken from the Population Frequency Table; however, if the entire file was not used in generating the frequency table then that value will not be useful. Even if the entire file was used to generate the frequency table, two situations warrant that a value is passed using this parameter. These two situations are,
—If a frequency table is used by different algorithms on different files, then a search on a file which was not used to generate the frequency table will require the
FILESIZE= parameter specifying that file’s true size.
— If a file grows in size by more than 10% and the Scale value is being used to estimate expected record counts then
FILESIZE= should be used to reflect the new size. If a file grows by between 10% and 25% annually, then the
FILESIZE= value should be updated annually to reflect this growth. Providing the
FILESIZE= parameter is passed at the application program level (as opposed to the definition file) this can be done programmatically through an anniversary date check
|
LENGTH=
| This keyword overrides, at run-time, the
NAME-LENGTH= option in the Algorithm. Values between 10 and 255 can be used. It is used to specify the length of the names passed to SSA-NAME3.
|
NAMEFORMAT=
| This keyword overrides, at run-time, the
NAME-FORMAT= option in the Algorithm. Valid values are
R or
L . It is used to specify if your names have the major word (example, surname) on the right (at the end) or on the left (at the beginning).
N.B. You should keep the value of the
NAMEFORMAT= keyword the same for both key-building and searching.
For more information on Name Format, refer to the
Factors which Determine the Format of a Name
section.
|
NOKEYS
| Do not generate multiple keys in the Keys-stack. It is sometimes desirable to perform a NAMESET call but not generate a Keys-stack. For example when performing a search you only need the search ranges and there is no need to generate a Keys-stack. Using this option will disable the generation of multiple keys in the Keys-stack which will increase the efficiency of the call. Note, however, that the preferred key is still generated in the Keys-stack even if
NOKEYS is used.
|
NOSTAB
| Do not generate a Search-table. It is sometimes desirable to perform a
NAMESET call but not generate a Search-table. For example when loading keys into a database you will not be searching for records so you don’t need the Search-table. Using this option will disable the generation of a Search-table which will increase the efficiency of the call.
The Search-table will still contain the terminating range.
|
ORIGWORDRANGE
| Turns on the building of an additional search range based on the unformatted words after Cleaning and Stabilization, but without Edit-list processing. This keyword assists in retrieving a candidate record that might otherwise be missed because either the search or file name failed to activate an edit list rule due to a slight misspelling.
|
ORIGWORDMIN
| Removes the duplicate records if you turn on the ORIGWORD logic.
|
REPEAT=
| Defines the number of fixed length names which are being passed to NAMESET in the one call. This keyword allows, for example, the passing of a name and a name alias for key or search-table building. Another use may be for current and former names.
The length of the
SSA-NAME3-NAME-IN and
SSA-NAME3-NAME-CLEAN parameters on the NAMESET call should be equal to
NAME-LENGTH (from the Algorithm definition) * this
REPEAT number.
The default
REPEAT number is 1.
For more information on building keys and search-tables for alias and former names, refer to the
Multi-Valued Fields
section.
|
STABSIZE=
| This allows the
SEARCH-TABLE-SIZE algorithm value to be dynamically set at runtime. Values can be from 2 to 65536 but if greater than 99 then
BINCOUNT must also be used. The minimum value is 2 to accommodate the bad entry range and closing range.
|
WORSTCASE=
| Normal calculation of the ’Scale’ of a NAMESET search range which contains uncommon words, uses a formula which estimates an ’average’ frequency for the uncommon word. In some cases the value will be an overestimate, and in some cases an underestimate, of the true number of records returned. The
WORSTCASE keyword causes the scale calculation to use the maximum frequency for an uncommon word which starts with the given initial, resulting in a bias towards overestimation of the true number of records returned. This can be useful for user applications which are very sensitive to selectivity choices.
|
LIMITCNRANGES1
| This option impacts generation of search ranges when multiple names separated by compound name marker are present in the input. When this option is selected, the ranges are generated only for the first name in the set of names making up the compound name. For example, if the input contains
Amelia Berg | Rube Lindsay | Anne Hopkins , and the pipe character
| is designated as the compound name marker, the ranges generated by this option will be limited to those created for
Amelia Berg .
For detailed information on compound names, see the
Compound Names section in this guide.
|
Secondary Name Keywords
| |
---|---|
Keyword
| Description
|
SECONDARY
| SECONDARY Build Secondary search ranges if the FIRST minor word contains a Secondary name. For example, if the Edit-list contains Secondary name rules as follows:
a search on
BERT HAROLD SMITH
would generate ranges for all the defined Secondary values of
BERT , that is:
The Secondary search ranges are built for the Preferred key word sequence only. In this example, the words in brackets would be used in the search range if a
START value greater than
WW is used and the enough words in the name are common.
|
SECMINOR
| Build Secondary search ranges for ALL minor words which contain a Secondary name. ( SECMINOR therefore includes the functionality of
SECONDARY ). For example, if the Edit-list contains Secondary name rules as follows:
a search on
BERT AL SMITH would generate ranges for all the defined Secondary values of
BERT and
AL independently of each other, that is:
The Secondary search ranges are built for the Preferred key word sequence only. In this example, the words in brackets would be used in the search range if a
START value greater than
WW is used and enough words in the name are common.
|
SECMAJOR
| Build Secondary search ranges for the Major word only, if it contains a Secondary name. For example, if the Edit-list contains a Secondary name rule as follows:
a search on
12 MAIN ST. MIDTOWN would generate the following ranges (if the Algorithm used
NAME-FORMAT=R and Edit-list major markers were not defined):
The Secondary search ranges are built for the Preferred key word sequence only. In this example, the numbers in brackets would be used in the search range if a
START value greater than WW is used and enough words in the address are common.
|
SECALL
| Build Secondary search ranges for all combinations of Secondary words including ranges built from the target names in Secondary Name rules. For example:
Using
SECALL , the name
BOB SMITH will generate ranges for ROBERT SMITH as well as
BERT SMITH .
SECALL generates more ranges than
SECMINOR + SECMAJOR .
|
SECPROBE=
| Changes secondary name ranges from a range to a probe. Using SECPROBE narrows searches that use secondary names.
Options are:
|
START=
| This defines the width of the range to be built.
The value is a string of the form
WWI ,
WW etc. (where W stands for "Word" and I stands for "Initial"). It must start with at least one
W and may end with one
I . Options that are both valid and make sense are:
where
W is the widest range possible and
WWWWI is the narrowest.
A level of W means "one word range" or "match on one word" (" SMITH * " is such a range).
WI means "one word and initial range" ("SMITH J* " is such a range).
Optionally the level can be defined as the estimated number of records that should match it, in which case this is a number;
START=20 means that you do not want ranges that are expected to match less than 20 records. The Service estimates the number of records based on the names file that was processed during the Population Frequency analysis stage of the generation. If the file that you wish to search is not the same as the file used for the frequency analysis (or the names file was a small sample of the full file) then you should explicitly supply the
FILESIZE= keyword.
|
SKIPSECN=
| Skips the specified category names for secondary names. For example, if
SKIPSECN=XP , then SSA-NAME3 skips the category name
XP for secondary names when it builds ranges. You can specify multiple category names by listing them sequentially, such as,
SKIPSECN=XPXP1XP2 .
To use the
SKIPSECN= keyword, ensure that you use the
SECONDARY keyword.
|
Customset Keywords
| |
---|---|
Keyword
| Description
|
CUSTOMSET=
| In some cases, the standard search ranges and probes generated by the many available Function keywords, may not be totally appropriate for certain data or search requirements.
To cater for special cases, the
CUSTOMSET= keyword invokes special search ranges called "Customset" ranges.
Specifying
CUSTOMSET=DEFAULT will invoke a default set of probe ranges designed to give quick access to the most likely candidates in a person name file containing a mixture of given names and initials. This set of ranges is internal to the product and cannot be changed by the user.
Specifying
CUSTOMSET=PERSON will invoke the user customizable ranges defined by the
CUSTOMSET-DEFINITION=PERSON patterns in the Algorithm. This allows the
DEFAULT person name ranges to be overridden.
Specifying
CUSTOMSET=1 will invoke the user customizable ranges defined by the
CUSTOMSET-DEFINITION=1 patterns in the Algorithm.
Specifying
CUSTOMSET=2 will invoke the user customizable ranges defined by the
CUSTOMSET-DEFINITION=2 patterns in the Algorithm.
Customset ranges are all identified by a ’ P ’ in the ’Set-Id ’ column of the Search-table parameter (see the
NAMESET
chapter of the
APPLICATION REFERENCE guide
for information on the Search-table) and are generated at the start of the Search-table.
For more information on the customization of search ranges using Customset, refer to the
Customset
section.
|
-CASCADE
| A Customset Search-table will, by default, be followed by a positive "cascade" of widening search ranges. If only the Customset ranges are wanted, use the
-CASCADE keyword to disable the positive search ranges. For example,
*CUSTOMSET=PERSON,-CASCADE*
|
EXCLUSIVE
| This keyword reduces the ranges returned by merging and removing ranges included in another range. Do not use with a Positive Search as only the widest range will be generated.
|
NOCSKIPS
| Similar to the
SKIPSKIPS Negative Search keyword,
NOCSKIPS inhibits the use of skip words as the major word in a search range when a Customset search strategy is being built.
|
NOINITRANGE
| Disables, at run-time, the generation of Customset ranges containing ranges on initials, even though those ranges were requested in the Customset Definition in the Algorithm (example, disables
RULE=W1,I2,IRANGE ).
|
NOWIDEPROBE
| Inhibits the generation of Customset ranges which were intended as probes, but, because of uncommon word encoding, became ranges. For example, if the following rule was defined in the Customset definition,
However, because
W1 and
W2 were uncommon the actual search range became
W1+W2+*, NOWIDEPROBE will cause this range not to be generated.
|
CSETCONCMINOR
| Allows concatenated-word range processing in Customsets to be defined for minor tokens. Example,
RULE=W1,W2+W3,RANGE
|
CSETINITTRUNC
| Generate Customset ranges for rules that use initials even if a word is found in the "initial" position.
For example, RULE=W1,I2,I3,RANGE
If the name contains all words (example,
JOHN ALAN SMITH ), without
CSETINITTRUNC , this rule generates a range based on the full words, (that is,
SMITH JOHN ALAN * .)
By specifying
CSETINITTRUNC , this rule causes a range to be generated after truncating the words in positions 2 and 3 to initials. For example,
JOHN ALAN SMITH generates the range
SMITH J A * .
|
BATCHMODE/BATCHMODE2
| Search strategies designed for online search applications are not always optimal for batch applications, such as clustering.
These keywords adjust customset ranges for batch applications by removing or converting customset search ranges to narrower searches.
BATCHMODE converts secondary name searches from ranges to probes.
BATCHMODE2 removes all the secondary name searches. Both keywords convert customset ranges to narrower searches with
BATCHMODE2 being the stricter of the two keywords.
|
CSETLOOKUP=N|Y|E
| Enables the NAMESET key-building option
26 . The NAMESET key-building option
26 defines the method in which
SSA-NAME3
generates Customset ranges.
Use one of the following values:
For more information about option
26 , see option
26 in the
NAMESET Key-Building Options
section in
Chapter 5, Algorithm Definition .
|
Positive Search Keywords
| |
---|---|
Keyword
| Description
|
CONCMINORRANGE
| This keyword causes a probe to be built by concatenating the first two minors. Example, for the name
JONG KU LEE , where
LEE is the major, specifying this keyword will build an additional probe for
JONGKU LEE .
|
FINE
| Fine ranges – include all ranges in the Search-table. Specific search ranges for uncommon words will be generated, not only for the whole word, but also for shortened versions of the word.
This is the default process, therefore it is only used to override
COARSE or
WORDS in an existing function definition.
|
COARSE
| Coarse ranges – this option causes keys to be generated for word/initial combinations as well as word/word combinations. Unlike
FINE , it generates search ranges which include both common and uncommon names. This option will generate more search ranges than
WORDS and less than
FINE . It is also mutually exclusive with the
WORDS and
FINE options.
|
WORDS
| Only allow ranges based on full words. This option restricts the generation of search ranges to those based on full words thus forcing fewer search ranges because those that include initials are ignored. It is mutually exclusive with the
COARSE and
FINE options.
|
START=
| This defines the first or narrowest range at which to start a Positive Search.
The value is a string of the form
WWI ,
WW etc. (where
W stands for "Word" and
I stands for "Initial"). It must start with at least one
W and may end with one
I . Options that are both valid and make sense are:
where
W is the widest range possible and
WWWWI is the narrowest.
A level of
W means "one word range" or "match on one word" ("SMITH *" is such a range).
WI means "one word and initial range" ("SMITH J *" is such a range).
Optionally the level can be defined as the estimated number of records that should match it, in which case this is a number;
START=20 means that you do not want ranges that are expected to match less than 20 records. The Service estimates the number of records based on the names file that was processed during the Population Frequency analysis stage of the generation. If the file that you wish to search is not the same as the file used for the frequency analysis (or the names file was a small sample of the full file) then you should explicitly supply the
FILESIZE= keyword.
|
STOP=
| Defines the last (widest) level to be included in the Search-table. Valid options are the same as the
START= keyword although the stop range should be wider than the start range.
|
1WORD
| A1 word probe (for the major word) precedes the Search-table for 1 word names. This probe is specifically for search names with only one word. If the name has more than 1 word the probe will not be generated.
|
2WORD
| A 2 word probe precedes the Search-table for 2 word names. Similar to
1WORD but generates a probe for search names with only two words. If the name has more than 2 words the probe will not be generated.
|
NWORD
| A probe for N words precedes the cascade in all cases. Regardless of the number of words in the search name a single probe will be generated for the combined names.
|
WIDEN
| This keyword controls the type of search range generated where a name would generate a wider search range than a
STOP= keyword has specified.
When
WIDEN is specified, a search name which is wider than the limit imposed by a
STOP= parameter, and has an initial as its most minor component (e.g.
WWI ,
WI ), will produce a search range on the initial (as well as a warning response code, RC 12), rather than a search probe on the initial. That is, for example, a search name of the form WI when used with a positive search function of
STOP=WW will produce the search range "WI*" rather than "WI! *", (e.g. "SMITH J* " rather than "SMITH J* ").
|
Negative Search Keywords
| |
---|---|
Keyword
| Description
|
CONCMINORRANGE
| This keyword causes a probe to be built by concatenating the first two minors. Example, for the name
JONG KU LEE , where
LEE is the major, specifying this keyword will build an additional probe for
JONGKU LEE .
|
EXCLUSIVE
| This keyword reduces the ranges returned by merging and removing ranges included in another range. Do not use with a Positive Search as only the widest range will be generated.
|
NEG
| Generate a negative Search-table (the default is positive). A negative Search-table contains a collection of independent, and sometimes overlapping, search ranges. These ranges are identified by an N in the range-type field of the Search-table and all such ranges should be processed before returning the results to the user. The depth of the ranges is controlled by the
START= option. For example,START=WW , will generate ranges on word pairs so a search for
JOHN ALEXANDER SMITH will generate ranges for the following word pairs:
START=WI , will generate ranges for word+initial. Therefore a search for
JOHN ALEXANDER SMITH will generate ranges for the following:
A concatenated word entry may generate a search range which already exists in the Search-table. For example,
ALEXANDERSMITH J* may generate the same range as
ALEXANDER J* if the words were uncommon in nature. In this case the duplicate will be deleted from the table.
|
CONCATRANGES
| The
CONCATRANGES keyword is used to turn the building of concatenated-word search ranges on. It has the effect, at runtime, of overriding and setting
SSA-NAME3-OPTIONS #10 to ’Y ’ (that is, build a concatenated-word search range for 2-word names), and setting
SSA-NAME3-OPTIONS #21 to ’N ’ (that is, build two "concatenated word" search ranges for > 2-word names, one where the first and second words are concatenated, and one where the second last and last words are concatenated).
-CONCATRANGES will disable the building of concatenated-word search ranges.
|
START=
| This defines the depth at which to perform a Negative Search. The value is a string of the form
WW ,
WI etc. (where
W stands for "Word" and
I stands for "Initial"). It must start with at least one
W and may end with one
I . Options that are both valid and make sense are:
A level of W means "one word range" or "match on one word" (" SMITH * " is such a range).
WI means "one word and initial range" ("SMITH J* " is such a range).
|
ICONCATP
| Normal concatenated-word range processing will concatenate words only. This keyword allows the processing to be applied to initial+word combinations. To learn more about concatenated-word processing, refer to the
Factors which Determine the Format of a Name / Negative Keys
section. Also see
SSA-NAME3-OPTION #28
.
|
ICONCAT
| Same as
ICONCATP .
|
ICONCATA
| Allows concatenated-word range processing to be applied to word+initial combinations. To learn more about concatenated word processing, refer to the
Factors which Determine the Format of a Name / Negative Keys section. Also see
SSA-NAME3-OPTION #28
.
|
ICONCATB
| Allows concatenated-word range processing to be applied to both word+initial and initial+word combinations. To learn more about concatenated-word processing, refer to the
Factors which Determine the Format of a Name / Negative Keys
section. Also see
SSA-NAME3-OPTION #28
.
|
PROBESWORD
| Add probes for each word. For example with the name
JOHN ANDREW SMITH three probes will be generated in addition to the normal Search-table, one for
JOHN , one for
ANDREW and another for
SMITH .
Does nothing unless the
NEG keyword is also specified. It does not make sense to use this in combination with
START=W .
|
PROBESINIT
| Add probes for each word + initial. Using
JOHN ANDREW SMITH , probes would be generated for the following word/initial pairs:
Does nothing unless the
NEG keyword is also specified. It does not make sense to use this in combination with
START=WI .
|
PROBESALL
| Shorthand method of specifying both
PROBESWORD and
PROBESINIT . It does not make sense to use this in combination with
START=W .
|
PROBESMAJ
| Generate a probe for the major word only. Does nothing unless the
NEG keyword and either the
PROBESINIT or
PROBESWORD keywords are also specified. Does not make sense to use this in combination with
START=W .
|
FULLSEARCH
| Generate negative ranges for all permutations of the words. Using
JOHN ANDREW SMITH , normal negative ranges would be generated for,
SMITH JOHN, SMITH ANDREW, ANDREW JOHN
this option causes ranges for the other two-word permutations to be generated:
JOHN SMITH, ANDREW SMITH, JOHN ANDREW
|
SKIPSKIPS
| Normal processing during a negative search permits the use of a skip word as the major word in a key range. This may be undesirable in some cases. This keyword inhibits the use of skip words as a major when a negative search is being performed. Also see
SSA-NAME3-OPTIONS #24
for the equivalent in Key Building.
|
WIDEN
| This keyword controls the type of search range generated where a name would generate a wider search range than a
START= keyword has specified.
When
WIDEN is specified, a search name which is wider than the limit imposed by a
START= parameter, and has an initial as its most minor component (example,
WWI ,
WI ), will produce a search range on the initial (as well as a warning response code, RC 12), rather than a search probe on the initial. That is, for example, a search name of the form
WI when used with a negative search function of
START=WW will produce the search range "WI* " rather than "WI! * ", (example, "SMITH J* " rather than "SMITH J * ").
|
INITPROBE
| Builds an extra search range (after the Customset ranges if any) that is an acronym of the search name. The search range in this case is a probe. There must be three or more words before a range will be built and a maximum of eight words will be used. By default, Skip Words will be used; however, Skip Words can be ignored by using the
IRNOSKIPS keyword in addition to
INITPROBE . For example, the search name THE ATLANTIC AND PACIFIC TEA COMPANY will generate a range for "APT! ".
|
INITRANGE
| Builds an extra search range (after the Customset ranges if any) that is an acronym of the search name. The search range in this case is a range. There must be three or more words before a range will be built and a maximum of eight words will be used. By default, Skip Words will be used; however, Skip Words can be ignored by using the
IRNOSKIPS keyword in addition to
INITRANGE . For example, the search name THE ATLANTIC AND PACIFIC TEA COMPANY will generate a range for "APT* ".
|
IRNOSKIPS
| Specifying this keyword will cause Skip words to be ignored by both
INITPROBE and
INITRANGE processing.
|
Secondary Phrase Keywords
| |
---|---|
Keyword
| Description
|
SECPHRASE
| Enables Secondary Phrase processing during both key building and searching. For example, if the Edit-list contains Secondary Phrase rules as follows:
then passing
JIM BOB DE LA HILL would cause keys or ranges to be built for these names:
The original name does not have keys or ranges built.
|
SECPHRASEALL
| Similar to
SECPHRASE in that it enables Secondary Phrase processing during both key building and searching. Differs in how input names containing multiple Secondary Phrases are processed. Using the example in
SECPHRASE above, the following names would be processed:
The three extra names are the results of combining the original and replacement phrases from each of the three Secondary Phrase rules. So, for example, we have
JIM BOB from the original name combined with the Secondary Phrase replacement for
DE LA , resulting in
JIM BOB OF THE HILL . Again, note that the original name does not have keys or ranges built.
|
SECPHRASEORIG
| May be specified in addition to
SECPHRASE or
SECPHRASEALL . Causes the original name to be processed during both key building and searching, in addition to names generated as a result of Secondary Phrase rules.
|
SKIPSECP=
| Skips the specified category names for a secondary phrase. For example, if
SKIPSECP=XP , then SSA-NAME3 skips the category name
XP for secondary phrases when it builds keys and ranges. You can specify multiple category names by listing them sequentially, such as,
SKIPSECP=XPXP1XP2 .
To use the
SKIPSECP= keyword, ensure that you use the
SECPHRASE keyword.
|
NAMESET Date Key Building Keywords
| |
---|---|
Keyword
| Description
|
DATEDROPCC
| Shorten all dates by removing the century.
|
DATEKEYS
| Create keys for a date.
|
DATESEARCH
| Create ranges for a date.
|
EXTRADATEDROP1PART
| Builds additional keys for dates after splitting the date into multiple parts.
A date in the
yyyy-mm-dd format is split into the following parts:
2000-10-11 ,
2000-10-13 , and
2002-10-11 , the
EXTRADATEDROP1PART keyword builds keys after splitting the date into the following parts:
YYYY-MM and
MM-DD
|
EXTRADATEDROPCC
| Create extra keys by removing the century.
|
EXTRADATEKEYS
| Add extra search ranges or keys - currently adds a
YYDDMM variation (so "10 12 1988" will find "12 10 1988").
|
SINGLEDATEKEY
| Only create one key/range - based on the format value.
|
YEARSPLIT=nn
| If the date has no century then if the YY part is greater than nn set
CC=19 else set
CC=20 . Default value is to have no effect - if
YEARSPLIT is not specified dates without the century do not have a century generated.
|
YYMMDD, MMDDYY, DDMMYY
| Treat the date as being in this format.
|
NAMESET Geocode Key Building Keywords
| |
---|---|
Keyword
| Description
|
GEOCODEKEYS
| Creates keys based on the latitude and longitude coordinates.
|
GEOCODESEARCH
| Creates a search range based on the GEOCODERADIUS keyword.
|
GEOCODERADIUS=n
| Specifies the search radius based on the latitude and longitude coordinates. The default value is 1000 m. The default unit is meter.
|
GEOCODEFORMAT=n
| Indicates the order of the latitude and longitude coordinates that you specify. If the value is 0, the coordinates are in the latitude and longitude order. If the value is 1, the coordinates are in the longitude and latitude order. The default value is 0.
|
NAMESET Code Key Building Keywords
| |
---|---|
Keyword
| Description
|
CODEKEYS
| Create keys for a code.
This keyword is mandatory for all code related key generation. The code key generation is designed to generate longer keys, so the definition should have higher key length specified for the respective field ( NM3KEYSIZE=12 ).
For usage, refer the
Example Code Key Definitions > Example 1
section given below.
|
CODESEARCH
| Create ranges for a code.
This keyword is mandatory for all code related range generation. Again the range generation is also designed to generate longer ranges. So the definition should have higher key length specified for the respective field ( NM3KEYSIZE=12 ). Also the range generation is designed to be
PROBES .
For usage, refer the Example 2 in the
Example Code Key Definitions section given below.
|
CODEKEYSEXT
| Extends the key length for the keys generated by
CODEKEYS and
CODESEARCH keywords. Enables the
CODEKEY and
CODESEARCH keywords to utilize the maximum key length specified for the respective field (NM3KEYSIZE=12 ).
|
CODEPREFIX=nn
| The code prefix can be specified here. This can be used with the FORMATTING-OPTIONS #22 to remove a fixed prefix. This can be used in case of phone numbers (to remove the country code) or credit card numbers (to remove the first six fixed numbers) etc.
Every code has a certain standards followed when it is designed.
For example, as per standards a telephone number in every country has a minimum number of required digits and maximum possible number of digits. In case of an Indian telephone number, there are 3 parts - country code, a region code and the actual number The actual phone number length could vary from 5 to 8, and the region code could vary from 2 to 5, and the country code is fixed to 91.
For usage, see the Example 3 in the
Example Code Key Definitions section.
|
EXTRACODEPROBES
| Builds an additional probe for each code with five or more characters in the word stack after formatting. The keys contain alphanumeric characters. Use this keyword to support model numbers for products that might not have digits.
|
EXTRACODEPROBES1
| Build an additional probe or key for each code after removing the last character. The keys contain alphanumeric characters. For example, if a model number is
KYX100M , SSA-NAME3 builds a key or range after removing the last character
M .
|
MINCODELEN=nn
| The minimum length of the code.
The keywords
DELETECODEKEYS ,
INSERTCODEKEYS and
TRANSCRIBEKEYS generates keys to address respective errors within the
MINCODELEN from the right.
For example, see the
CODEPREFIX description above. The Indian telephone number minimum required length is 5 and maximum is 10 excluding the country code. In this case,
MINCODELEN can be set to 5, as the majority of error possibility falls in that region of 5 digits from right, one can increment the same to address errors further towards left of the code. For more details, see the
DELETECODEKEYS ,
INSERTCODEKEYS
and
TRANSCRIBEKEYS sections.
|
MAXCODELEN=nn
| The maximum length of the code.
You can combine this keyword with the
FORMATTING-OPTIONS #22 and
#23 to remove the leading code prefix.
See the
MINCODELEN section to understand the logic behind setting the minimum and maximum length for a code. In the example given in the
MINCODELEN section, for an Indian telephone number, the
MAXCODELEN can be set to 10. The removal of the prefix is based on the
MAXCODELEN even though you set the
FORMATTING-OPTIONS #22 and
#23 .
For usage, see the Example 4 in the
Example Code Key Definitions section.
|
CODEALLOWRANGE=nn
| The allowable range in a code.
Key generation for ranges in case of a telephone number, where the number contains continuous numbers separated by special characters such as forward slash (/) and hyphen (-). For example, 9900502724/[25-27].
The forward slash and hyphen can be considered as normal split or a range split as specified in the charset table. The given number are considered as the allowable range.
For usage, see the Example 5 in the Example Code Key Definitions section.
|
EXACTCODEKEY
| The two main key generation patterns for codes are as follows:
The Exact code pattern is self-explanatory as it generates keys for the exact code.
The Transposition pattern handles the transpose errors in the code. It creates two keys for a single code to handle the transposition error possibility in the entire code.
Key generation for the exact code. The length of the code is a constraint for this keyword. Use code-related formatting options for correctness.
For usage, see the Example 6 in the
Example Code Key Definitions section.
|
TRANSPOSEKEYS
| Key generation to avoid the transposition errors in the code. All keywords explained below are dependant on this keyword. So all the below options will have the transposition error (contributes to majority of errors in a code) addressed on top of the other possible errors.
For usage, see the Example 7 in the
Example Code Key Definitions section.
|
DELETECODEKEYS
| Create extra keys to remove deletion errors. Use this keyword for more exhaustive key generation. This keyword generates extra keys for deletion errors possible within
MINCODELEN from the right side of the code.
For usage, see the Example 8 in the
Example Code Key Definitions section.
|
INSERTCODEKEYS
| Create extra keys to remove insertion errors. Use this keyword for more exhaustive key generation. This keyword generates extra keys for insertion or repetition errors possible within
MINCODELEN from the right side of the code.
For usage, see the Example 9 in the
Example Code Key Definitions section.
|
TRANSCRIBEKEYS
| Create extra keys to remove transcription errors. Use this keyword for more exhaustive key generation. This keyword generates extra keys for increment errors possible within
MINCODELEN from the right side of the code.
For usage, see the Example 10 in the
Example Code Key Definitions section.
|
TRIMLZERO
| Create extra key by removing the leading ’0’, if present.
If
TRIMLZERO is defined, extra key patterns will be generated by removing the initial zero.
|
CCVALIDATE
| Create extra key for a valid credit card number by fixing the check digit using the credit card check digit calculation.
For usage, see the Example 11 in the
Example Code Key Definitions section.
|
VINVALIDATE
| Create extra key for a valid Vehicle Identification Number by fixing the check digit using the Vehicle Identification Number check digit calculation.
Similar to the Credit Card Validation Vehicle Identification Number check digit calculation is performed and extra keys are generated for a valid VIN.
See the
CCVALIDATE section.
|
ISBN10VALIDATE
| Create extra key for a valid ISBN10 Number by fixing the check digit using the ISBN10 Number check digit calculation.
Similar to the Credit Card Validation ISBN10 Number check digit calculation is performed and extra keys are generated for a valid ISBN10 Number.
Refer to the
CCVALIDATE section.
|
ISBN13VALIDATE
| Create extra key for a valid ISBN13 Number by fixing the check digit using the ISBN13 Number check digit calculation.
Similar to the Credit Card Validation ISBN13 Number check digit calculation is performed and extra keys are generated for a valid ISBN13 Number.
Refer to the
CCVALIDATE section.
|