Table of Contents

Search

  1. Preface
  2. Introduction
  3. Definition File Overview
  4. Customization Steps
  5. Service Group Definition
  6. Algorithm Definition
  7. Edit-list Definition
  8. Matching Scheme Definition

Service Group Definition and Customization Guide

Service Group Definition and Customization Guide

Word Weight Modification in N3SCM

Word Weight Modification in N3SCM

The default processing of the word weight modifying options (
MAJMOD
,
SKIPMOD
,
CATSW
) is different in N3SCM than for the previous name matching entry point, N3SCL.
To better understand the following explanation, it is useful to know that during the matching process, the default maximum Score that can be attributed to a word pair is 100.
For example, the two words ANDERSON and ANDERSON score 100/100. The two words ALLEN and ANDERSEN score 30/100 (the default Score when only the initial matches).
When all words in a name have a maximum Score of 100, they have equal weighting in the final Score out of 100.
This maximum word Score value of 100 can be varied by the word weight modification options
MAJMOD
,
SKIPMOD
,
CATSW
, and thus the weighting of word pairs can be changed.
The N3SCM method uses different default processing for the word weight modifying options than N3SCL.
There are two differences,
  1. In N3SCL, the variation of the maximum Score was applied prior to word pairs being chosen, and thus contributed to the choice of the word pairs. The effect was that more significance was given to the word types (MAJMOD, SKIPMOD) or categories (CATSW) of word pairs, rather than to their likeness.
    For example, with N3SCL,
    (using
    REFMIN, MAJMOD*120, NAME-FORMAT=R, NOSTD, NORAW
    ),
    SEARCH: JOHN ALLEN ANDERSON FILE: JOHN ANDERSON ALLEN
    scored 035, because with the MAJMOD*120 setting, the ANDERSON vs ALLEN pair scored higher ((30/100)*12) than the ANDERSON vs ANDERSON pair ((100/100)*1), and would be chosen.
    Using the default N3SCM processing, the same match would score lower (021), because the word pairs are first chosen on their natural likeness prior to applying the
    MAJMOD
    option. In this example, the word pairs ANDERSON vs ANDERSON and ALLEN vs ALLEN score higher ((100/100)*1) than ANDERSON vs ALLEN pair ((30/100)*1) leaving no major word match to apply the weight modifier to.
  2. In N3SCL, the weight of a word pair was allowed to be less than 100. Using N3SCM default settings, the weight cannot be less than 100.
    The effect of this can be shown by the following CATSW example.
    *C NN R Nick-names NN JONATHON >JOHN < OPTION CATSW VALUE NN,9 SEARCH: JOHN SMITH FILE: JONATHON SMITH
    In N3SCL,
    SCORE: 100 (9/9 + 10/10)
    In N3SCM,
    SCORE: 95 (90/100 + 100/100)
    To override this default behavior, and have N3SCM operate the same as N3SCL, use options
    CATSS
    ,
    LIMWCAT
    or
    USECWAIT
    as described below.

Local Options Addressing Truncation & Initials

Option
Description
Example
OPTION SCORES
VALUE INIT,[number]
This option controls how an Initial will match against the first character of a word.
[number]
is a value between 0 and 10, where 0 means attribute a 0/10 Score if the Initial matches the first character of the word and 10 means attribute a 10/10 Score if the Initial matches. If
SCALEFTR
,1 is specified,
[number]
can be between 0 and 100. If an Edit-list nickname rule has been defined, for example to replace Bill with William, W. Smith would still match Bill Smith. If this option is omitted, an initial will be compared to a full word using a string comparison and if it matches, will be awarded a Score of 3/10.
  1. Not using this operand,
    SEARCH: W D BROWN FILE: WILLIAM DEAN BROWN SCORE: 053
  2. With
    VALUE INIT,0
    SEARCH: W D BROWN FILE: WILLIAM DEAN BROWN SCORE: 033
  3. With
    VALUE INIT,10
    SEARCH: W D BROWN FILE: WILLIAM DEAN BROWN SCORE: 100
  4. With
    VALUE INIT,5
    SEARCH: W D BROWN FILE: WILLIAM DEAN BROWN SCORE: 66
  5. With
    VALUE INIT,10
    SEARCH: W D BROWN FILE: BILL DEAN BROWN SCORE: 100
LOPT=(INITLOW)
The default Score for an Initial matching the first character of a word is 3/10. With the
INIT
option (described above), it is possible to raise this Score to a maximum of 10/10 and the
INIT
value, by default, is applied to all cases where an Initial matches the first character of a word. In cases where the non-initial words do not match, however, it may be desirable to reduce the value of the Initial/Word Score, say, for example when two family names do not match, but the given Initial of one still matches the given name of the other. The
INITLOW
option reduces the significance of initials if all of the noninitial words do not match. The SCORE in such cases is reduced to the default value of 3/10. Provided at least one of the non-initial words match,
INITLOW
will not be applied. For example, with VALUE INIT,10 specified, G N HOLLOWAY will match GREG NORMAN HALL with a Score of 076. Using the
INITLOW
option the Score is reduced to 030. If there is an exact match between any words in the name the processing of
INITLOW
is disabled.
  1. With
    INIT,10
    SEARCH: ANDREW DEAN SMITH FILE: A D BROWN SCORE: 066
  2. With
    INIT,10 & INITLOW
    SEARCH: ANDREW DEAN SMITH FILE: A D BROWN SCORE: 020
  3. With
    INIT,10 & INITLOW
    SEARCH: ANDREW DEAN SMITH FILE: A D SMITH SCORE: 100
  4. With
    INIT,10
    SEARCH: ANDREW DEAN SMITH FILE: A DEAN BROWN SCORE: 066
  5. With
    INIT,10 & INITLOW
    SEARCH: ANDREW DEAN SMITH FILE: A DEAN BROWN SCORE: 066
OPTION FLAGS
VALUE INITCODE,{0/1}
This option is used to prevent Initials being compared withWords when either is a code. This is used to prevent a high Score being returned in the case where
INIT
is also used. A value of
0
turns the option on (i.e. prevents Initials being matched with Words when either is a code), a value of 1 turns the option off. The default is off. For example, with INIT,10, 1 and 176 will Score 3/10. With the INITCODE,0 specified, the comparison will get a Score of 0/10.
OPTION FLAGS
VALUE EXACTWRD,{0/1}
VALUE EXACTINI,{0/1}
With
EXACTWRD
,
1
and
EXACTINI
,
1
exact initial to initial matches will be retained, regardless of whether a better Score may have been achieved by matching the initial to a word. For example,
GRIFFIN, JOHN W J
GRIFFIN, JAMES W J
with
EXACTWRD
,
0
and
EXACTINI
,
0
(the default), and
VALUE INIT
,
10
, will score 100, because the initial
J
in each name matches exactly with the words John and James respectively. With
EXACTWRD
,
1
and
EXACTINI
,
1
the Score would be lower, e.g. 080, because John and James are not as good a match.
EXACTINI
,
1
requires
EXACTWRD
,
1
before it will function.
OPTION FLAGS
VALUE EXACTMCH,{0/1}
If two records match exactly then a Score of 100 is immediately given, bypassing Formatting. This is not always desirable, for example, in cases where an Edit List rule should be used prior to Matching.
The default is
EXACTMCH
,1 which will result in an early exact match check. Changing to
EXACTMCH,0
switches off exact match check. For example, The following Edit List rules are defined:
*C PT D Personal Title *P SYSTEMS INTEGRATION MANAGER *R MANAGER PT MANAGER><
With
EXACTMCH,1
SEARCH: SYSTEMS INTEGRATION MANAGER FILE: SYSTEMS INTEGRATION MANAGER SCORE: 100
With
EXACTMCH,0
SEARCH: SYSTEMS INTEGRATION MANAGER FILE: SYSTEMS INTEGRATION MANAGER SCORE: 000
OPTION FLAGS
VALUE SKIPMTCH,{0/1}
Usually an initial will not match a skip word, using
SKIPMTCH,1
will allow such a match.
SKIPMTCH,0
is the default. For example if University and Technology are skip words:
U.T.S. University of Technology Sydney
With
INIT,10
SCORE: 010
With
INIT,10 & SKIPMTCH,1
SCORE: 100
OPTION FLAGS
VALUE OPTIMILW,{0/1}
The default is
OPTIMILW,1
. When
INITLOW
is active and it reduces an initial/word Score, a check is done to see if a better word match can be found. If one can, it is used instead of the degraded original match.
To turn off this optimization, use
OPTIMILW,0
. Comparing these two names for example,
PETER PETERS P
With
INITLOW
,
INIT,10
and
OPTIMILW,0
, the Score returned would be 030. This occurs because of two things.
INIT,10
causes
P / PETER
to score 10/10 and to be chosen for the match over
PETER / PETERS,
and
INITLOW
takes effect on the
P/ PETER
match because the
PETER / PETERS
pair was not a match, thus decreasing the Score to 030. With
INITLOW
,
INIT,10
and
OPTIMILW
,1, the Score returned would be 080, because a check is done to see if a better word match can be found, in this case the Score of the
PETER / PETERS
pair.
OPTION FLAGS
VALUE ILOWWRDS,{0/1}
This option is used in conjunction with
INITLOW
to reduce the score for an initial-to-word match (to 3/10) if there are any unmatched words between the two names. To turn it on, specify
ILOWWRDS,1
.
The default is
ILOWWRDS,0
. For example, without
ILOWWRDS,1
(and assuming
REFMIN
and
INIT,9
):
SEARCH: HELEN M RICHARDSON FILE: MICHAEL RICHARDSON SCORE: 095
With
ILOWWRDS,1
:
SEARCH: HELEN M RICHARDSON FILE: MICHAEL RICHARDSON SCORE: 065
OPTION SCORES
VALUE ILOWTRIG,[number]
This option controls the value for a word Score to be considered a match by the
INITLOW
processing.
The default is 10, i.e. if an initial / word match is present and two other words do not match 10/10,
INITLOW
processing will take place. Changing the value to 8 (as an example) will prevent
INITLOW
degrading the Score of an initial / word match when two other words are considered a reasonable match (in this case 8 / 10). If
SCALEFTR,1
is specified,
[number]
can be between 0 and 100. For example, with options:
LOPT=(INITLOW) OPTION SCORES VALUE INIT,10 SEARCH: JOHN SMITH FILE: J SNITH SCORE: 055
With the additional option:
OPTION SCORES VALUE ILOWTRIG,8 SEARCH: JOHN SMITH FILE: J SNITH SCORE: 090
the Score becomes 090 because the
J / JOHN
match is not effected by
INITLOW
. This is because the
SMITH / SNITH
match is 8/10 and the
ILOWTRIG
option causes
INITLOW
processing to be bypassed.
LOPT=(ABBRMIN)
ABBRMIN
sets the minimum length of an abbreviation that can match. For example assuming
ABBRMIN*3
is specified. If a word of length 3 or more matches the beginning of another (longer) word, the Score specified with the
ABBRSCR
option is returned. In other words the short word is an abbreviation of the long word. Using the
ABBRSCR
example, ROBE --> ROBERT matches ROB --> ROBERT matches ROBIN --> ROBERT doesn’t match Note that the shorter of the two words must still be a 100% match with the beginning of the longer word for this logic to be invoked. matches ROBIN --> ROBERT doesn’t match Note that the shorter of the two words
LOPT=(ABBRSCR)
Sets the Score for an abbreviated match, e.g.. 8 = 80%, 10 = 100%. When two words match according to the
ABBRMIN
rules the Score specified here is returned for the match on the two words. For example, 1. With no
ABBRMIN
or
ABBRSCR
SEARCH: ROBERT FILE: ROBERTA SCORE: 080
With
LOPT=(ABBRMIN*3+ABBRSCR*10)
SEARCH: ROBERT FILE: ROBERTA SCORE: 100 SEARCH: RO FILE: ROBERTA SCORE: 030
OPTION FLAGS
VALUE ABBSCRCL, [number]
Specifies whether to apply penalty when using the
ABBRSCR
option.
  • OPTION FLAGS VALUE ABBSCRCL, 0
    . Does not apply penalty if two words mismatch at the start position when using the
    ABBRSCR
    option.
  • OPTION FLAGS VALUE ABBSCRCL, > 1
    . Applies the specified penalty for each excess character that does not match between two words when using the
    ABBRSCR
    option.
When matching the words BOW and BOWES in the following example, with
OPTION FLAGS VALUE ABBSCRCL, 5
, the score reduces by 10.
Total penalty = Number of excess characters × Penalty value
10 = 2 × 5
MATCH SCORE-ONLY,MTBL=250 MM 76 A BOW ST 76 A BOWES ST
OPTION FLAGS
VALUE FMTINIT,{0,1,2,3,4,5,6}
FORMATTING-OPTIONS #9
controls how Formatting treats a run of two or more initials. If it is set to a value other than ’N’, initials will be concatenated. This is the normal behavior for company and mixed company/person algorithms. This is important for keys and search strategies so that, for example, ABC HOLDINGS is able to successfully find A.B.C. HOLDINGS. Formatting options also affect matching in that a name is processed through Formatting prior to being matched. This behavior, however, may be undesirable in cases such as when a search for J W SMITH finds JOHN SMITH. The two formatted names that get compared would be JW SMITH and JOHN SMITH and the JW and JOHN do not match well. Use one of the following values:
  • 0. Sets the
    FORMATTING-OPTIONS #9
    to N. This option does not affect the key building or searching. When you use this option, if you want to concatenate the initials, you can use the CONC, CINITI, or CINITA options.
  • 1. Does not set any value for the
    FORMATTING-OPTION #9
    . Default value is 1.
  • 2. Sets the
    FORMATTING-OPTIONS #9
    to X.
  • 3. Sets the
    FORMATTING-OPTIONS #9
    to I.
  • 4. Sets the
    FORMATTING-OPTIONS #9
    to B.
  • 5. Sets the
    FORMATTING-OPTIONS #9
    to Y.
  • 6. Sets the
    FORMATTING-OPTIONS #9
    to E.
For more information about the
FORMATTING-OPTIONS #9
, see Module Options.
OPTION CONCINIT
VALUE THRSHOLD,[Score]
VALUE MININIT,[number]
VALUE MAXINIT,[number]
VALUE ALLWSKIP,{0/1}
VALUE SCORE,[Score]
VALUE PENALTY,[number]
VALUE NORSCORE,{0/1}
VALUE PARTMTCH,{0/1}
VALUE SKIPGOOD,{0,1}
The
CONCINIT
option allows matching of acronyms to full names. For example:
IDENTITY SYSTEMS LTD IS
An acronym may be retrieved as a candidate in a search by using the
INITPROBE
or
INITRANGE
NAMESET
function keywords. An acronym and full name may also become a search and file record in matching because of a search on another field (e.g. address). Acronym matching, if done, takes place at the end of the matching process, after an original Score has been computed. Acronym matching will only be attempted if the original Score is below the
THRSHOLD
value. The default threshold score value is 80. The
MININIT
and
MAXINIT
values set the minimum and maximum number of words in the full name that can be matched to the acronym (starting from the left). For example, it would be typical to set
MININIT
at 3 (the default) because most acronyms start at three words. A reasonable
MAXINIT
value would be 8 (the default). By default, Skip Words are allowed to participate in acronym matching. Skip Words can be disallowed in acronym matching by setting
ALLWSKIP
to 0. By default, a successful acronym match will return a Score of 100. It may be desirable to set the maximum Score lower. This can be achieved with the
SCORE
value setting. Using the
PENALTY
value, it is possible to decrement the acronym Score by the number of excess words in the non-reference record. If
PENALTY
is omitted, no penalty is applied for excess words. By default, the acronym Score is returned only if it is greater than the original Score. By setting
NORSCORE
to 0, the acronym Score is returned whether it is greater or lesser than the original Score. For looser matching, specify
PARTMTCH,1
. This allows part of the acronym to match and a score to be computed relative to the number of initials that matched. For example,
IDENTITY SYSTEMS PTY LTD ISS
will score 66 if
PARTMTCH,1
is specified. 0, the default, does not allow part acronym matching and the Score would be 0. By default, words that match 100% are included in the
CONCINIT
rescore. By setting
SKIPGOOD
to 1, words that match 100% are excluded from the
CONCINIT
rescore.

Local Options Addressing Concatenation

Option
Description
Example
LOPT=(CONC)
Allow concatenated matches. This option allows concatenated words to match against separate words. For example, when matching,
ROBERT HACKFORTH JONES
with
ROBERT HACKFORTHJONES
The
HACKFORTH JONES
will match to produce a total Score of 100% with the
CONC
option. Without it a Score of 75% is returned.
LOPT=(CINITM)
Allow multiple concatenations. This option allows the concatenation of more than two words. It requires that
CONC
is also specified.
For example,
  1. Not using
    CINITM
    SEARCH: IDENTITYSYSTEMS FILE: IDENTITY SYSTEMS SCORE: 050
  2. With
    LOPT=(CONC+CINITM)
    SEARCH: IDENTITYSYSTEMS FILE: IDENTITY SYSTEMS SCORE: 100
LOPT=(CINITI)
Allow concatenation of initials. Requires that
CONC
is also specified.
  1. With no
    CINITI
    SEARCH: SMITH Y R FILE: SMITHY R SCORE: 090
  2. With
    LOPT=(CONC+CINITI)
    SEARCH: SMITH Y R FILE: SMITHY R SCORE: 100
LOPT=(CINITA)
Allow both initials and multiple concatenations. Shorthand for specifying both
CINITI
and
CINITM
. Requires that
CONC
is also specified.
The syntax is:
LOPT=(CONC+CINITA)
OPTION CONCAT
VALUE PLURALS,{0/1}
VALUE RAW,{0/1}
VALUE SCORE,[maximum Score]
VALUE THRSHOLD,[threshold Score]
VALUE ORIGWORD,{0/1}
VALUE WNUMBER,{0/1}
By setting PLURALS to 1, a trailing S on one of the two words/concatenated words will match 100%. Default value of
PLURALS
is 0. Setting
RAW
to 1 will perform a raw compare and accept the match if it is above the
threshold Score
.
  • maximum Score
    – The maximum word Score for a concatenated match. An integer value between 0 and 100.
  • threshold Score
    – the word Score level at which to accept a concatenated word match. An integer value between 0 and 100. For example: A match between names like ’RichCraft’ vs ’Rich Crafts’ can receive a higher Score. It is possible that matching two names, where one name has two words concatenated, returns a poor score because one of the unconcatenated words had a replacement rule in the edit list. For example the name "MARY KATE" will not match well against "MARYKATE" if the word "KATE" has been replaced by "KATHERINE" in the edit list. The solution is to compare the original word as well as the replacement.
  • This new
    ORIGWORD
    logic is turned off by default, as it will have a minor performance impact due to the extra comparison required. Setting
    ORIGWORD
    to 1 turns this feature on.
  • The
    SCORE
    option now limits the maximum allowed score rather than scaling it and the new
    SYNCS
    option default value is 2.
  • The
    WNUMBER
    option concatenates two words to improve the scores if the value is set to 1 or higher. This option affects the score of a single token and rescores by concatenating two words.
    Default is 0.

Local Options Addressing Word Order

Option
Description
LOPT=(NOORDER)
Normally, any Scores over 75 are degraded by 1 for each out-of-order word pair (or by larger amounts if
OPTION ORDER
is used). This option disables that feature.
  1. Not using
    NOORDER
    SEARCH: EQUIPMENT MAINTENANCE COMPANY FILE: MAINTENANCE EQUIPMENT COMPANY SCORE: 098
  2. With
    LOPT=(NOORDER)
    SEARCH: EQUIPMENT MAINTENANCE COMPANY FILE: MAINTENANCE EQUIPMENT COMPANY SCORE: 100
OPTION ORDER
VALUE POS,[number]
VALUE SEQ,[number]
VALUE TRIGGER,[number]
Normally any Scores over 75 will cause out-of-order word checking to be enabled. Default out-oforder word checking will decrement a Score by 1 for each out-of-order word pair. This processing can be turned off with the
NOORDER
option. To change the default trigger Score of 75, use the
TRIGGER
option. Out-of-order means either out-of-position or out-of-sequence. To explain the meaning of outof- position and out-of-sequence, refer to the following example. The following two names have words out of position (SMITH vs ALAN), but not out of sequence (SMITH follows JOHN in both cases),
JOHN SMITH JOHN ALAN SMITH
If the default out-of-order processing is used (i.e. no
NOORDER
and no
OPTION ORDER
), and assuming
REFMIN
is also used, these two names will score 99. If it is desired to only decrement the Score if the names are either out-of-position or out-of-sequence, use the
VALUE POS
or
VALUE SEQ
options. These options are mutually exclusive. Use the
VALUE POS
option to specify a value (between 0 and 100) by which to decrement the Score for each word out-of-position. Use the
VALUE SEQ
option to specify a value (between 0 and 100) by which to decrement the Score for each word out-of-sequence.
OPTION ORDER
VALUE PER,[penalty]
VALUE PERFLAG,{0,1,2}
Specifying
VALUE PER,n
causes an additional check of the first and last words in the two names to be performed. If the two words are different then penalty
n
is applied to the score. E.g: Not using
VALUE PER,n
SEARCH: ANDREW JOHN SMITH FILE: JOHN SMITH SCORE: 100
Using
VALUE PER,1
SEARCH: ANDREW JOHN SMITH FILE: JOHN SMITH SCORE: 099
In addition to
VALUE PER
,
n
,
VALUE
PERFLAG
,
m
may be specified. Note that this option has an effect only where one of the two word stacks contains a single word. In these cases, the value of
m
modifies the behavior as follows:
VALUE PERFLAG,0
Always apply the penalty. This is the default.
VALUE PERFLAG,1
Ensure that the matching word is the first in each stack before applying the penalty. Use the
NAME-FORMAT
setting to determine the meaning of first. i.e If
NAME-FORMAT=L
, then the matching word must be the leftmost words. If
NAME-FORMAT=R
, then the matching word must be the rightmost words.
VALUE PERFLAG,2
Ensure that the matching word is the first in each stack before applying the penalty, irrespective of the
NAME-FORMAT
setting. i.e. the matching word must be the leftmost words. In the case where both names contain a single word, then this option has no effect.

0 COMMENTS

We’d like to hear from you!