Table of Contents

Search

  1. Preface
  2. Introduction
  3. Definition File Overview
  4. Customization Steps
  5. Service Group Definition
  6. Algorithm Definition
  7. Edit-list Definition
  8. Matching Scheme Definition

Service Group Definition and Customization Guide

Service Group Definition and Customization Guide

N3SCM – Name Matching

N3SCM – Name Matching

The Name Matching Method is used to compare two names, either person names, company names, account names, product names or any other name. It is also used to compare addresses.
Name matching differs from string matching in a number of ways. The more significant differences are that Name matching,
  • uses an Edit-list
  • may use aWord Stabilization routine
  • will try out of order word matches, initial to word matches, concatenated word matches and other variations, including String comparison, to achieve the best Score.
The Local Options which control this Method’s behavior can be divided into the following categories, by how they address:
  • Truncation & Initials (see the
    Local Options Addressing Truncation & Initials
    section)
  • Concatenation (see the
    Local Options Addressing Concatenation
    section)
  • Word Order (see the
    Local Options Addressing Word Order
    section)
  • Word Type (see the
    Local Options Addressing Word Type
    section)
  • Spelling (see the
    Local Options Addressing Spelling
    section)
  • Long Names or Addresses (see the
    Local Options Addressing Long Names or Addresses
    section)
  • Multi-valued fields (Account Names, Compound Names, Secondary names, Repeating fields) (see the
    Local Options Addressing Multi-valued Fields
    section)
  • Local Options ControllingWord Score (see the
    Local Options Controlling Word Score
    section)
  • Local Options Controlling Reference Record Matching (see the
    Local Options Controlling Reference Record Matching
    section) .
A summary of the options below is followed by a more detailed description of each option in each of the above categories.
The N3SCM Name Matching method is the latest Name Matching. It incorporates all of the features and options available in N3SCL plus new features.

Summary of Options

The following table lists all of the available N3SCM options in alphabetical order.
Option
Description
Syntax
ABBRMIN
Sets the minimum length for an abbreviated match.
Refer to the
Local Options Addressing Truncation & Initials
section for further information.
LOPT=
ABBRSCR
Sets the Score for an abbreviated match. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
LOPT=
ALLWSKIP
Allow Skip words when matching to an acronym.
Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION CONCINIT
AVERAGE
Calculates the average of the original and recalculated Score when
CLIMIT
logic is activated. Refer to the
Local Options Addressing Long Names or Addresses
section for further information.
OPTION CLIMIT
CATSWD
When using
CATSW
or
CATSS
, this option disables
CATSW
and
CATSS
processing when an Initial to Word match is being processed and the Word is in a
CATSW
or
CATSS
category. Refer to the
Local Options Addressing Word Type
section for further information.
OPTION FLAGS
CATSWEXT
When using
CATSW
or
CATSS
, this option enables a 100% match if the word pair was the same before editing. Refer to the
Local Options Addressing Word Type
section for further information.
OPTION FLAGS
CATSWF
When using
CATSW
or
CATSS
, this option forces
CATSW
and
CATSS
processing to be performed even if
MAJMOD
processing is done. Refer to the
Local Options Addressing Word Type
section for further information.
OPTION FLAGS
CHARDS
Disable
CLIMIT
logic for words which Score above the
CHARDS
limit. Refer to the
Local Options Addressing Long Names or Addresses
section for further information.
OPTION CLIMIT
CINITA
Allow both initial and multiple concatenations. Refer to the
Local Options Addressing Concatenation
section for further information.
LOPT=
CINITI
Allow concatenation of initials. Refer to the
Local Options Addressing Concatenation
section for further information.
LOPT=
CINITM
Allow multiple concatenations. Refer to the
Local Options Addressing Concatenation
section for further information.
LOPT-
CLN
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION SORTSCOR
CODEMAXD
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION CODESCOR
CODEPOSS
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION CODESCOR
CODEWGHT
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION CODESCOR
CODEUDIF
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION CODESCOR
CODEUONE
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION CODESCOR
CONC
Allow concatenated matches. Refer to the
Local Options Addressing Concatenation
section for further information.
LOPT=
ENABLDNM
Indicates whether to disable matching the specified pairs of words even though they are similar. Refer to the
Local Options Addressing Word Type
section for further information.
OPTION FLAGS
EXACTCAT
Ignores
CATSW
and
CATSS
option if an exact match exists after formatting. Refer to the
Local Options Addressing Word Type
section for further information.
OPTION FLAGS
EXACTWRD
Causes exact word matches to be retained and not optimized. Refer to the
Local Options Addressing Word Type and Local Options Addressing Truncation & Initials
sections for further information.
OPTION FLAGS
EXACTINI
Causes exact initial matches to be retained and not optimized. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION FLAGS
EXACTMCH
Switches off early exact match check. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION FLAGS
EXCTCODE
Codes must match exactly. Refer to the
Local Options Addressing Word Type
section for further information.
XOPT=
FMT
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION SORTSCOR
FMTINIT
Turn off
FORMATTING-OPTIONS #9
, concatenation of initials. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION FLAGS
GOOD
Specifies the Word Score that needs to be achieved for a user-defined Score to be returned. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION REFN
ILOWTRIG
Controls the value for a word Score to be considered a match by
INITLOW
processing. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION SCORES
ILOWWRDS
Lowers the Score if one or more words match and there are no initial matches. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION FLAGS
INIT
Controls how an initial matches against the first letter of a word. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION SCORES
INITCODE
Prevents codes acting as initials. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION FLAGS
INITLOW
Disables
INIT
when the non-initial words do not match. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
LOPT=
LIMWCAT
Allows the maximum Weight of a word defined by
CATSW
to be less than 10. Refer to the
Local Options Addressing Word Type
section for further information.
OPTION FLAGS
MAJMOD
Allows the Score for major word matches to be increased or decreased. Refer to the
Local Options Addressing Word Order
section for further information.
LOPT=
MATCHEND
Allows a string match (raw compare) to resync at the last character. Refer to the
Local Options Addressing Spelling
section for further information.
OPTION FLAGS
MAXINIT
Maximum number of words allowed when matching to an acronym. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION CONCINIT
MININIT
Minimum number of words allowed when matching to an acronym. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION CONCINIT
MOVEMNT
Allows finer control over
MAJMOD
. Refer to the
Local Options Addressing Word Type
section for further information.
OPTION MAJMOD
NGRAMC
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION SORTSCOR
NGRAMCLV
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION SORTSCOR
NGRAMF
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION SORTSCOR
NGRAMFLV
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION SORTSCOR
NOEXCLST
Specify a list of Word-types to be used in
NOEXCESS
processing to give improved matching on concatenated words. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION NOEXCLST
NOEXPNTY
In
CLIMIT
processing, reduce the score by a value dependent on the difference in the number of tokens between the two
CLIMLIST
stacks. Refer to the
Local Options Addressing Long Names or Addresses
section for further information.
OPTION CLIMIT
NOINCR
Use the original Score if the new Score is greater than the original Score when
CLIMIT
logic is activated. Refer to the
Local Options Addressing Long Names or Addresses
section for further information.
OPTION CLIMIT
NOORDER
Disable Score reduction when words match out of order. Refer to the
Local Options Addressing Word Order
section for further information.
LOPT=
NORAW
Disable raw string matching. Refer to the
Local Options Addressing Spelling
section for further information.
LOPT=
NORSCORE
Controls whether a Score higher than the original Score can be returned from an acronym match. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION CONCINIT
NOSTD
Disable stabilized word matching. Refer to the
Local Options Addressing Spelling
section for further information.
LOPT=
NSACTF
Controls what action to take when
CLIMIT
logic is activated and no new Score can be achieved. Refer to the
Local Options Addressing Long Names or Addresses
section for further information.
OPTION CLIMIT
OPTIMILW
Switches off initial/word matching optimization when using
INITLOW
. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION FLAGS
ORIGCSW
Enables the edit-list category options
CATSW
and
CATSS
for matching original words. For more information about
CATSW
and
CATSS
options, refer to the
Local Options Addressing Word Type
section.
OPTION FLAGS
ORIGWORD
Option allows comparing the original word as well as edit list replacement. Refer to the
Local Options Addressing Concatenation
section for further information.
OPTION CONCAT
ORIGWSCR
Allows the Score to be re-calculated on each unformatted word and sets the maximum Score. Refer to the
Local Options Addressing Spelling
section for further information.
OPTION SCORES
ORIGWTHR
Score threshold below which
ORIGWSCR
matching is allowed. Refer to the
Local Options Addressing Spelling
section for further information.
OPTION SCORES
PARTMTCH
This allows part of the acronym to match and a score to be computed relative to the number of initials that matched. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION CONCINIT
PENALTY
Decrements the Score by the
PENALTY
value for excess words when using the
CONCINIT
option. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION CONCINIT
PENALTY
Decrements the Score by the
PENALTY
value for excess words when using
NOEXCESS
option. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION NOEXCESS
PER
Compares first and last words and applies the specified penalty if they are different. Refer to the
Local Options Addressing Word Order
section for further information.
OPTION ORDER
PERFLAG
Used to apply finer control to the
PER
option (above). Refer to the
Local Options Addressing Word Order
section for further information.
OPTION ORDER
PLURALS
Allows for a trailing ’S’ on one of the pair of words so as to match 100%. Refer to the
Local Options Addressing Concatenation
section for further information.
OPTION CONCAT
POS
Set the decrement for out of position processing. Refer to the
Local Options Addressing Word Order
section for further information.
OPTION ORDER
RAW
Performs a raw compare of concatenated words. Refer to the
Local Options Addressing Concatenation
section for further information.
OPTION CONCAT
RAWCMPTN
Causes raw string compares to be calculated out of 100 instead of 10. Refer to the
Local Options Addressing Spelling
section for further information.
OPTION FLAGS
RAWSTBTH
Increases the score by a factor based on the raw and stabilized scores. Refer to the
Local Options Addressing Spelling
section for further information.
OPTION SCORES
RAWSTBVL
Value used in the
RAWSTBTH
calculation. Refer to the
Local Options Addressing Spelling
section for further information.
OPTION SCORES
RECREF
Causes re-calculation of
REFMIN/REFMAX
based on the word types in
CLIMLIST
. Refer to the
Local Options Addressing Long Names or Addresses
section for further information.
OPTION CLIMIT
REFCNT
Specifies the number of words that must be present to trigger a bonus penalty to be applied via
REFMULT
. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION NOEXCESS
REFF
Applies
REFMULT
logic only if the file record meets the
REFCNT
condition. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION NOEXCESS
REFMULT
A multiplier for the
NOEXCESS PENALTY
when
REFCNT
&
REFF/REFS
conditions are met. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION NOEXCESS
REFS
Applies
REFMULT
logic only if the search record meets the
REFCNT
condition. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION NOEXCESS
SCALEFTR
Changes the word score scale to be out of 100 instead of out of 10. Refer to the
Local Options Controlling Word Score
section for further information.
OPTION FLAGS
SCORE
Sets the maximum Score allowed for a concatenated match. Refer to the
Local Options Addressing Concatenation
section for further information.
OPTION CONCAT
SCORE
Sets the maximum Score allowed for an acronym match. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION CONCINIT
SCORE
Controls the Score returned by
MAJMOD
when the major word in one name matches well against any word in the other name. Refer to the
Local Options Addressing Word Type
section for further information.
OPTION MAJMOD
SCORE
Specifies the user-defined Score to be returned when using the
REFN
option. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION REFN
SEC
Allows a user-defined Score for Secondary Word matches. Refer to the
Local Options Addressing Multi-valued Fields
section for further information.
OPTION SCORES
SECOND
Specifies which types of Secondary names should be expanded for matching. Refer to the
Local Options Addressing Multi-valued Fields
section for further information.
OPTION FLAGS
SECPHRSE
Creates secondary phrase names. 0 is the default (off), 1 turns the feature on and 2 creates all secondary names. Refer to the
Local Options Addressing Multi-valued Fields
section for further information.
OPTION FLAGS
SECPHRSE
Allows a user-defined Score for secondary phrase matches. Refer to the
Local Options Addressing Multi-valued Fields
section for further information.
OPTION SCORES
SECPORIG
If this is set to 1, then include original names before secondary phrase rules are applied. Default is 0. Refer to the
Local Options Addressing Multivalued Fields
section for further information.
OPTION FLAGS
SEQ
Set the decrement for out of sequence processing. Refer to the
Local Options Addressing Word Order
section for further information.
OPTION ORDER
SKIPCONS
Matches multiple consonants with a single consonant. For more information about SKIPCONS, see the Local Options Addressing Spelling topic.
OPTION FLAGS
SKIPGOOD
If this is set to 1 then words that match 100% are excluded from the
CONCINIT
rescore. Default is 0. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION CONCINIT
SKIPMAJM
Allows an exact match early-exit even when
MAJMOD
is specified. Refer to the
Local Options Addressing Word Type s
ection for further information.
OPTION FLAGS
SKIPMTCH
Allows an initial to match a skip word. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION FLAGS
SKIPMOD
Allows the Score for skip word matches to be increased or decreased. Refer to the
Local Options Addressing Word Type
section for further information.
XOPT=
SKIPSMOD
Allows an exact match early-exit even when
SKIPMOD
is specified. Refer to the
Local Options Addressing Word Type
section for further information.
OPTION FLAGS
SKIPVOWL
Matches vowels or ignores a vowel when compared with a consonant. For more information about SKIPVOWL, see the Local Options Addressing Spelling topic.
OPTION FLAGS
SORTWGHT
Used by specialised code scoring option. Refer to the
Local Options Addressing Matching of Codes
section for further information.
OPTION SORTSCOR
SREFCNT
Specifies the number of skip words that must be present to trigger a bonus penalty to be applied via
SREFMULT
. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION NOEXCESS
SREFMULT
A multiplier for the
NOEXCESS PENALTY
when
SREFCNT
&
REFF/REFS
conditions are met. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION NOEXCESS
STD
Allows the Score for a stabilized word match to be increased or decreased. Refer to the
Local Options Addressing Spelling
section for further information.
OPTION SCORES
SYNCS
Sets the minimum number of characters which must match to enable re-synchronization in a raw string comparison. Refer to the
Local Options Addressing Spelling
section for further information.
OPTION FLAGS
THRSHOLD
Sets the level at which to accept a concatenated match. Refer to the
Local Options Addressing Concatenation
section for further information.
OPTION CONCAT
THRSHOLD
Score threshold below which acronym matching is allowed. Refer to the
Local Options Addressing Truncation & Initials
section for further information.
OPTION CONCINIT
THRSHOLD
Sets the threshold Score above which
MAJMOD
processing will take place. Refer to the
Local Options Addressing Word Type
section for further information.
OPTION MAJMOD
TRANSLEN
Sets the minimum word length for character transposition matching to be applied. Refer to the
Local Options Addressing Spelling
section for further information.
OPTION FLAGS
TRIGGER
Decrements the Score if it is greater than or equal to the
TRIGGER
Score. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION NOEXCESS
TRIGGER
Invoke out of order logic if the Score is above the
TRIGGER
value. Refer to the
Local Options Addressing Word Order
section for further information.
OPTION ORDER
TRIGS
Invoke
CLIMIT
logic if the Score is above the
TRIGS
value. Refer to the
Local Options Addressing Long Names or Addresses
section for further information.
OPTION CLIMIT
USECATSW
Enables the CATSW option. For more information about the CATSW option, see the
Local Options Addressing Word Type
.
OPTION REFN
USECWAIT
Allows the maximum Score of a word pair to be less than 10. Refer to the
Local Options Controlling Word Score
section for further information.
OPTION FLAGS
WBELOW
Set the Score for any single word to zero if the raw string word Score is less than a specified value. Refer to the
Local Options Addressing Spelling
section for further information.
OPTION SCORES
WORDS
Specifies the number of non-initial words which need to match in order to return a user-defined Score. Refer to the
Local Options Controlling Reference Record Matching
section for further information.
OPTION REFN
WORSTSCR
Controls which score should be returned when comparing a repeating group. Refer to the
Local Options Addressing Multi-valued Fields
section for further information.
OPTION FLAGS
WSCRNOEX
Defines the amount to reduce the score when comparing a repeating group in which the number of repeats differs between the search and file records. Refer to the
Local Options Addressing Multi-valued Fields
section for further information.
OPTION FLAGS
Editlist Category
Specifies the Edit-list Category Name to participate in Score reduction. Refer to the
Local Options Addressing Word Type / OPTION CATSW
section for further information.
OPTION CATSW
Editlist Category
Specifies the Edit-list Category Name to participate in Score reduction. Refer to the
Local Options Addressing Word Type / OPTION CATSS
section for further information.
OPTION CATSS
Editlist Category
Disables an Edit-list Category Name during Matching. Refer to the
Local Options Addressing Word Type / OPTION CATNIGN
section for further information.
OPTION CATNIGN
Editlist Category
Disables an Edit-list Category Type during Matching. Refer to the
Local Options Addressing Word Type / OPTION CATTIGN
section for further information.
OPTION CATTIGN
Editlist Category
Overrides the meaning of an Edit-list Category Name with a different Category Type. Refer to the
Local Options AddressingWord Type / OPTION CATNREP
section for further information.
OPTION CATNREP
Editlist Category
Overrides the meaning of an Edit-list Category Type with a different Category Type. Refer to the
Local Options Addressing Word Type / OPTION CATTREP
section for further information.
OPTION CATTREP
Editlist Category
Specifies the Edit-list Category Name to participate in Secondary name matching. Refer to the
Local Options Addressing Multi-valued Fields / OPTION SECCAT
section for further information.
OPTION SECCAT
Word-type
Specifies the Word-type to participate in re-scoring when using
CLIMIT
logic. Refer to the
Local Options Addressing Long Names or Addresses / OPTION CLIMLIST
section for further information.
OPTION CLIMLIST
Word-type
Specifies the Word-type to participate in Secondary name matching. Refer to the
Local Options Addressing Multi-valued Fields / OPTION SECTYPE
section for further information.
OPTION SECTYPE

Word Weight Modification in N3SCM

The default processing of the word weight modifying options (
MAJMOD
,
SKIPMOD
,
CATSW
) is different in N3SCM than for the previous name matching entry point, N3SCL.
To better understand the following explanation, it is useful to know that during the matching process, the default maximum Score that can be attributed to a word pair is 100.
For example, the two words ANDERSON and ANDERSON score 100/100. The two words ALLEN and ANDERSEN score 30/100 (the default Score when only the initial matches).
When all words in a name have a maximum Score of 100, they have equal weighting in the final Score out of 100.
This maximum word Score value of 100 can be varied by the word weight modification options
MAJMOD
,
SKIPMOD
,
CATSW
, and thus the weighting of word pairs can be changed.
The N3SCM method uses different default processing for the word weight modifying options than N3SCL.
There are two differences,
  1. In N3SCL, the variation of the maximum Score was applied prior to word pairs being chosen, and thus contributed to the choice of the word pairs. The effect was that more significance was given to the word types (MAJMOD, SKIPMOD) or categories (CATSW) of word pairs, rather than to their likeness.
    For example, with N3SCL,
    (using
    REFMIN, MAJMOD*120, NAME-FORMAT=R, NOSTD, NORAW
    ),
    SEARCH: JOHN ALLEN ANDERSON FILE: JOHN ANDERSON ALLEN
    scored 035, because with the MAJMOD*120 setting, the ANDERSON vs ALLEN pair scored higher ((30/100)*12) than the ANDERSON vs ANDERSON pair ((100/100)*1), and would be chosen.
    Using the default N3SCM processing, the same match would score lower (021), because the word pairs are first chosen on their natural likeness prior to applying the
    MAJMOD
    option. In this example, the word pairs ANDERSON vs ANDERSON and ALLEN vs ALLEN score higher ((100/100)*1) than ANDERSON vs ALLEN pair ((30/100)*1) leaving no major word match to apply the weight modifier to.
  2. In N3SCL, the weight of a word pair was allowed to be less than 100. Using N3SCM default settings, the weight cannot be less than 100.
    The effect of this can be shown by the following CATSW example.
    *C NN R Nick-names NN JONATHON >JOHN < OPTION CATSW VALUE NN,9 SEARCH: JOHN SMITH FILE: JONATHON SMITH
    In N3SCL,
    SCORE: 100 (9/9 + 10/10)
    In N3SCM,
    SCORE: 95 (90/100 + 100/100)
    To override this default behavior, and have N3SCM operate the same as N3SCL, use options
    CATSS
    ,
    LIMWCAT
    or
    USECWAIT
    as described below.

Local Options Addressing Truncation & Initials

Option
Description
Example
OPTION SCORES
VALUE INIT,[number]
This option controls how an Initial will match against the first character of a word.
[number]
is a value between 0 and 10, where 0 means attribute a 0/10 Score if the Initial matches the first character of the word and 10 means attribute a 10/10 Score if the Initial matches. If
SCALEFTR
,1 is specified,
[number]
can be between 0 and 100. If an Edit-list nickname rule has been defined, for example to replace Bill with William, W. Smith would still match Bill Smith. If this option is omitted, an initial will be compared to a full word using a string comparison and if it matches, will be awarded a Score of 3/10.
  1. Not using this operand,
    SEARCH: W D BROWN FILE: WILLIAM DEAN BROWN SCORE: 053
  2. With
    VALUE INIT,0
    SEARCH: W D BROWN FILE: WILLIAM DEAN BROWN SCORE: 033
  3. With
    VALUE INIT,10
    SEARCH: W D BROWN FILE: WILLIAM DEAN BROWN SCORE: 100
  4. With
    VALUE INIT,5
    SEARCH: W D BROWN FILE: WILLIAM DEAN BROWN SCORE: 66
  5. With
    VALUE INIT,10
    SEARCH: W D BROWN FILE: BILL DEAN BROWN SCORE: 100
LOPT=(INITLOW)
The default Score for an Initial matching the first character of a word is 3/10. With the
INIT
option (described above), it is possible to raise this Score to a maximum of 10/10 and the
INIT
value, by default, is applied to all cases where an Initial matches the first character of a word. In cases where the non-initial words do not match, however, it may be desirable to reduce the value of the Initial/Word Score, say, for example when two family names do not match, but the given Initial of one still matches the given name of the other. The
INITLOW
option reduces the significance of initials if all of the noninitial words do not match. The SCORE in such cases is reduced to the default value of 3/10. Provided at least one of the non-initial words match,
INITLOW
will not be applied. For example, with VALUE INIT,10 specified, G N HOLLOWAY will match GREG NORMAN HALL with a Score of 076. Using the
INITLOW
option the Score is reduced to 030. If there is an exact match between any words in the name the processing of
INITLOW
is disabled.
  1. With
    INIT,10
    SEARCH: ANDREW DEAN SMITH FILE: A D BROWN SCORE: 066
  2. With
    INIT,10 & INITLOW
    SEARCH: ANDREW DEAN SMITH FILE: A D BROWN SCORE: 020
  3. With
    INIT,10 & INITLOW
    SEARCH: ANDREW DEAN SMITH FILE: A D SMITH SCORE: 100
  4. With
    INIT,10
    SEARCH: ANDREW DEAN SMITH FILE: A DEAN BROWN SCORE: 066
  5. With
    INIT,10 & INITLOW
    SEARCH: ANDREW DEAN SMITH FILE: A DEAN BROWN SCORE: 066
OPTION FLAGS
VALUE INITCODE,{0/1}
This option is used to prevent Initials being compared withWords when either is a code. This is used to prevent a high Score being returned in the case where
INIT
is also used. A value of
0
turns the option on (i.e. prevents Initials being matched with Words when either is a code), a value of 1 turns the option off. The default is off. For example, with INIT,10, 1 and 176 will Score 3/10. With the INITCODE,0 specified, the comparison will get a Score of 0/10.
OPTION FLAGS
VALUE EXACTWRD,{0/1}
VALUE EXACTINI,{0/1}
With
EXACTWRD
,
1
and
EXACTINI
,
1
exact initial to initial matches will be retained, regardless of whether a better Score may have been achieved by matching the initial to a word. For example,
GRIFFIN, JOHN W J
GRIFFIN, JAMES W J
with
EXACTWRD
,
0
and
EXACTINI
,
0
(the default), and
VALUE INIT
,
10
, will score 100, because the initial
J
in each name matches exactly with the words John and James respectively. With
EXACTWRD
,
1
and
EXACTINI
,
1
the Score would be lower, e.g. 080, because John and James are not as good a match.
EXACTINI
,
1
requires
EXACTWRD
,
1
before it will function.
OPTION FLAGS
VALUE EXACTMCH,{0/1}
If two records match exactly then a Score of 100 is immediately given, bypassing Formatting. This is not always desirable, for example, in cases where an Edit List rule should be used prior to Matching.
The default is
EXACTMCH
,1 which will result in an early exact match check. Changing to
EXACTMCH,0
switches off exact match check. For example, The following Edit List rules are defined:
*C PT D Personal Title *P SYSTEMS INTEGRATION MANAGER *R MANAGER PT MANAGER><
With
EXACTMCH,1
SEARCH: SYSTEMS INTEGRATION MANAGER FILE: SYSTEMS INTEGRATION MANAGER SCORE: 100
With
EXACTMCH,0
SEARCH: SYSTEMS INTEGRATION MANAGER FILE: SYSTEMS INTEGRATION MANAGER SCORE: 000
OPTION FLAGS
VALUE SKIPMTCH,{0/1}
Usually an initial will not match a skip word, using
SKIPMTCH,1
will allow such a match.
SKIPMTCH,0
is the default. For example if University and Technology are skip words:
U.T.S. University of Technology Sydney
With
INIT,10
SCORE: 010
With
INIT,10 & SKIPMTCH,1
SCORE: 100
OPTION FLAGS
VALUE OPTIMILW,{0/1}
The default is
OPTIMILW,1
. When
INITLOW
is active and it reduces an initial/word Score, a check is done to see if a better word match can be found. If one can, it is used instead of the degraded original match.
To turn off this optimization, use
OPTIMILW,0
. Comparing these two names for example,
PETER PETERS P
With
INITLOW
,
INIT,10
and
OPTIMILW,0
, the Score returned would be 030. This occurs because of two things.
INIT,10
causes
P / PETER
to score 10/10 and to be chosen for the match over
PETER / PETERS,
and
INITLOW
takes effect on the
P/ PETER
match because the
PETER / PETERS
pair was not a match, thus decreasing the Score to 030. With
INITLOW
,
INIT,10
and
OPTIMILW
,1, the Score returned would be 080, because a check is done to see if a better word match can be found, in this case the Score of the
PETER / PETERS
pair.
OPTION FLAGS
VALUE ILOWWRDS,{0/1}
This option is used in conjunction with
INITLOW
to reduce the score for an initial-to-word match (to 3/10) if there are any unmatched words between the two names. To turn it on, specify
ILOWWRDS,1
.
The default is
ILOWWRDS,0
. For example, without
ILOWWRDS,1
(and assuming
REFMIN
and
INIT,9
):
SEARCH: HELEN M RICHARDSON FILE: MICHAEL RICHARDSON SCORE: 095
With
ILOWWRDS,1
:
SEARCH: HELEN M RICHARDSON FILE: MICHAEL RICHARDSON SCORE: 065
OPTION SCORES
VALUE ILOWTRIG,[number]
This option controls the value for a word Score to be considered a match by the
INITLOW
processing.
The default is 10, i.e. if an initial / word match is present and two other words do not match 10/10,
INITLOW
processing will take place. Changing the value to 8 (as an example) will prevent
INITLOW
degrading the Score of an initial / word match when two other words are considered a reasonable match (in this case 8 / 10). If
SCALEFTR,1
is specified,
[number]
can be between 0 and 100. For example, with options:
LOPT=(INITLOW) OPTION SCORES VALUE INIT,10 SEARCH: JOHN SMITH FILE: J SNITH SCORE: 055
With the additional option:
OPTION SCORES VALUE ILOWTRIG,8 SEARCH: JOHN SMITH FILE: J SNITH SCORE: 090
the Score becomes 090 because the
J / JOHN
match is not effected by
INITLOW
. This is because the
SMITH / SNITH
match is 8/10 and the
ILOWTRIG
option causes
INITLOW
processing to be bypassed.
LOPT=(ABBRMIN)
ABBRMIN
sets the minimum length of an abbreviation that can match. For example assuming
ABBRMIN*3
is specified. If a word of length 3 or more matches the beginning of another (longer) word, the Score specified with the
ABBRSCR
option is returned. In other words the short word is an abbreviation of the long word. Using the
ABBRSCR
example, ROBE --> ROBERT matches ROB --> ROBERT matches ROBIN --> ROBERT doesn’t match Note that the shorter of the two words must still be a 100% match with the beginning of the longer word for this logic to be invoked. matches ROBIN --> ROBERT doesn’t match Note that the shorter of the two words
LOPT=(ABBRSCR)
Sets the Score for an abbreviated match, e.g.. 8 = 80%, 10 = 100%. When two words match according to the
ABBRMIN
rules the Score specified here is returned for the match on the two words. For example, 1. With no
ABBRMIN
or
ABBRSCR
SEARCH: ROBERT FILE: ROBERTA SCORE: 080
With
LOPT=(ABBRMIN*3+ABBRSCR*10)
SEARCH: ROBERT FILE: ROBERTA SCORE: 100 SEARCH: RO FILE: ROBERTA SCORE: 030
OPTION FLAGS
VALUE FMTINIT,{0/1}
FORMATTING-OPTIONS #9
controls how Formatting treats a run of two or more initials. If it is set to a value other than ’N’, initials will be concatenated. This is the normal behavior for company and mixed company/person algorithms. This is important for keys and search strategies so that, for example, ABC HOLDINGS is able to successfully find A.B.C. HOLDINGS. Formatting options also affect matching in that a name is processed through Formatting prior to being matched. This behavior, however, may be undesirable in cases such as when a search for J W SMITH finds JOHN SMITH. The two formatted names that get compared would be JW SMITH and JOHN SMITH and the JW and JOHN do not match well. By setting FMTINIT to 0 (1 is the default),
FORMATTING-OPTIONS #9
is set to ’N’ (do not concatenate initials) for matching. This does not affect the key-building or searching. When using this option, if it is still desirable to have matching try concatenating the initials, then the options
CONC
and
CINITI
(or
CINITA
) should also be specified.
OPTION CONCINIT
VALUE THRSHOLD,[Score]
VALUE MININIT,[number]
VALUE MAXINIT,[number]
VALUE ALLWSKIP,{0/1}
VALUE SCORE,[Score]
VALUE PENALTY,[number]
VALUE NORSCORE,{0/1}
VALUE PARTMTCH,{0/1}
VALUE SKIPGOOD,{0,1}
The
CONCINIT
option allows matching of acronyms to full names. For example:
IDENTITY SYSTEMS LTD IS
An acronym may be retrieved as a candidate in a search by using the
INITPROBE
or
INITRANGE
NAMESET
function keywords. An acronym and full name may also become a search and file record in matching because of a search on another field (e.g. address). Acronym matching, if done, takes place at the end of the matching process, after an original Score has been computed. Acronym matching will only be attempted if the original Score is below the
THRSHOLD
value. The default threshold score value is 80. The
MININIT
and
MAXINIT
values set the minimum and maximum number of words in the full name that can be matched to the acronym (starting from the left). For example, it would be typical to set
MININIT
at 3 (the default) because most acronyms start at three words. A reasonable
MAXINIT
value would be 8 (the default). By default, Skip Words are allowed to participate in acronym matching. Skip Words can be disallowed in acronym matching by setting
ALLWSKIP
to 0. By default, a successful acronym match will return a Score of 100. It may be desirable to set the maximum Score lower. This can be achieved with the
SCORE
value setting. Using the
PENALTY
value, it is possible to decrement the acronym Score by the number of excess words in the non-reference record. If
PENALTY
is omitted, no penalty is applied for excess words. By default, the acronym Score is returned only if it is greater than the original Score. By setting
NORSCORE
to 0, the acronym Score is returned whether it is greater or lesser than the original Score. For looser matching, specify
PARTMTCH,1
. This allows part of the acronym to match and a score to be computed relative to the number of initials that matched. For example,
IDENTITY SYSTEMS PTY LTD ISS
will score 66 if
PARTMTCH,1
is specified. 0, the default, does not allow part acronym matching and the Score would be 0. By default, words that match 100% are included in the
CONCINIT
rescore. By setting
SKIPGOOD
to 1, words that match 100% are excluded from the
CONCINIT
rescore.

Local Options Addressing Concatenation

Option
Description
Example
LOPT=(CONC)
Allow concatenated matches. This option allows concatenated words to match against separate words. For example, when matching,
ROBERT HACKFORTH JONES
with
ROBERT HACKFORTHJONES
The
HACKFORTH JONES
will match to produce a total Score of 100% with the
CONC
option. Without it a Score of 75% is returned.
LOPT=(CINITM)
Allow multiple concatenations. This option allows the concatenation of more than two words. It requires that
CONC
is also specified.
For example,
  1. Not using
    CINITM
    SEARCH: IDENTITYSYSTEMS FILE: IDENTITY SYSTEMS SCORE: 050
  2. With
    LOPT=(CONC+CINITM)
    SEARCH: IDENTITYSYSTEMS FILE: IDENTITY SYSTEMS SCORE: 100
LOPT=(CINITI)
Allow concatenation of initials. Requires that
CONC
is also specified.
  1. With no
    CINITI
    SEARCH: SMITH Y R FILE: SMITHY R SCORE: 090
  2. With
    LOPT=(CONC+CINITI)
    SEARCH: SMITH Y R FILE: SMITHY R SCORE: 100
LOPT=(CINITA)
Allow both initials and multiple concatenations. Shorthand for specifying both
CINITI
and
CINITM
. Requires that
CONC
is also specified.
The syntax is:
LOPT=(CONC+CINITA)
OPTION CONCAT
VALUE PLURALS,{0/1}
VALUE RAW,{0/1}
VALUE SCORE,[maximum Score]
VALUE THRSHOLD,[threshold Score]
VALUE ORIGWORD,{0/1}
By setting PLURALS to 1, a trailing S on one of the two words/concatenated words will match 100%. Default value of
PLURALS
is 0. Setting
RAW
to 1 will perform a raw compare and accept the match if it is above the
threshold Score
.
  • maximum Score
    – The maximum word Score for a concatenated match. An integer value between 0 and 100.
  • threshold Score
    – the word Score level at which to accept a concatenated word match. An integer value between 0 and 100. For example: A match between names like ’RichCraft’ vs ’Rich Crafts’ can receive a higher Score. It is possible that matching two names, where one name has two words concatenated, returns a poor score because one of the unconcatenated words had a replacement rule in the edit list. For example the name "MARY KATE" will not match well against "MARYKATE" if the word "KATE" has been replaced by "KATHERINE" in the edit list. The solution is to compare the original word as well as the replacement. This new
    ORIGWORD
    logic is turned off by default, as it will have a minor performance impact due to the extra comparison required. Setting
    ORIGWORD
    to 1 turns this feature on. The
    SCORE
    option now limits the maximum allowed score rather than scaling it and the new
    SYNCS
    option default value is 2.

Local Options Addressing Word Order

Option
Description
LOPT=(NOORDER)
Normally, any Scores over 75 are degraded by 1 for each out-of-order word pair (or by larger amounts if
OPTION ORDER
is used). This option disables that feature.
  1. Not using
    NOORDER
    SEARCH: EQUIPMENT MAINTENANCE COMPANY FILE: MAINTENANCE EQUIPMENT COMPANY SCORE: 098
  2. With
    LOPT=(NOORDER)
    SEARCH: EQUIPMENT MAINTENANCE COMPANY FILE: MAINTENANCE EQUIPMENT COMPANY SCORE: 100
OPTION ORDER
VALUE POS,[number]
VALUE SEQ,[number]
VALUE TRIGGER,[number]
Normally any Scores over 75 will cause out-of-order word checking to be enabled. Default out-oforder word checking will decrement a Score by 1 for each out-of-order word pair. This processing can be turned off with the
NOORDER
option. To change the default trigger Score of 75, use the
TRIGGER
option. Out-of-order means either out-of-position or out-of-sequence. To explain the meaning of outof- position and out-of-sequence, refer to the following example. The following two names have words out of position (SMITH vs ALAN), but not out of sequence (SMITH follows JOHN in both cases),
JOHN SMITH JOHN ALAN SMITH
If the default out-of-order processing is used (i.e. no
NOORDER
and no
OPTION ORDER
), and assuming
REFMIN
is also used, these two names will score 99. If it is desired to only decrement the Score if the names are either out-of-position or out-of-sequence, use the
VALUE POS
or
VALUE SEQ
options. These options are mutually exclusive. Use the
VALUE POS
option to specify a value (between 0 and 100) by which to decrement the Score for each word out-of-position. Use the
VALUE SEQ
option to specify a value (between 0 and 100) by which to decrement the Score for each word out-of-sequence.
OPTION ORDER
VALUE PER,[penalty]
VALUE PERFLAG,{0,1,2}
Specifying
VALUE PER,n
causes an additional check of the first and last words in the two names to be performed. If the two words are different then penalty
n
is applied to the score. E.g: Not using
VALUE PER,n
SEARCH: ANDREW JOHN SMITH FILE: JOHN SMITH SCORE: 100
Using
VALUE PER,1
SEARCH: ANDREW JOHN SMITH FILE: JOHN SMITH SCORE: 099
In addition to
VALUE PER
,
n
,
VALUE
PERFLAG
,
m
may be specified. Note that this option has an effect only where one of the two word stacks contains a single word. In these cases, the value of
m
modifies the behavior as follows:
VALUE PERFLAG,0
Always apply the penalty. This is the default.
VALUE PERFLAG,1
Ensure that the matching word is the first in each stack before applying the penalty. Use the
NAME-FORMAT
setting to determine the meaning of first. i.e If
NAME-FORMAT=L
, then the matching word must be the leftmost words. If
NAME-FORMAT=R
, then the matching word must be the rightmost words.
VALUE PERFLAG,2
Ensure that the matching word is the first in each stack before applying the penalty, irrespective of the
NAME-FORMAT
setting. i.e. the matching word must be the leftmost words. In the case where both names contain a single word, then this option has no effect.

Local Options Addressing Word Type

This set of Local Options controls the behavior of the Method when comparing certain types of words. Different types of words include,
  • Codes
  • Major Words
  • Skip Words
  • Exact Words
  • Edit-list Words
Option
Description
Example
XOPT=(EXCTCODE)
This option specifies that codes only match if they match exactly. For the definition of a ’code’ see the
Formatting Options
section. See also c
Local Option INITCODE
.
For example,
  1. Not using
    EXCTCODE
    SEARCH: 10 MAYBERRY AVENUE FILE: 101 MAYBERRY AVENUE SCORE: 080
  2. With
    XOPT=(EXCTCODE)
    SEARCH: 10 MAYBERRY AVENUE FILE: 101 MAYBERRY AVENUE SCORE: 050
OPTION FLAGS
VALUE EXACTWRD,{0/1}
With
EXACTWRD,1
exact word to word matches will be retained, regardless of whether a better Score may have been achieved by other means. This situation arises mainly if the
MAJMOD
option has also been set.
For example,
HONG, HOANG HOANG, HANG FAI
with
EXACTWRD, 0
(the default), and
MAJMOD*20
, will score around 070; with
EXACTWRD,1
and
MAJMOD*20
, will score lower, e.g. 056. This is because with
MAJMOD
applied to the major words (
HONG
&
HOANG
) they score higher than the exact match for the words
HOANG
&
HOANG
, unless
EXACTWRD,1
is used, in which case the exact match words take precedence.
LOPT=(MAJMOD*[number])
The
MAJMOD
option tells the Entry Point to modify its Score if a match was found on a major word. This is done by applying a scaling factor to any major word (as flagged by the Formatting routine) found in the name.
For example when matching the name KEN JOHN BROWN the names KEN, JOHN and BROWN each contributes equally to the Score. However, the
MAJMOD
option can be used to give more importance to the major word (BROWN in this case, i.e. if Algorithm
NAME-FORMAT=R
.)
Giving a value for
MAJMOD
of 10 will scale the major word by 1 (i.e. not scale it at all) and will give the same behavior as omitting the option. A value of 20 will scale by 2, etc. For example,
LOPT=(MAJMOD*20)
will cause the importance of the major word to be doubled in the final Score calculation. This is achieved by increasing both the score and the weight for the major word. If the
MAJMOD
value is less than 10, the weight is not reduced below 100.
For example,
  1. With no
    MAJMOD
    ,
    SEARCH: ANNE SMITH FILE: ANNE BROWN SCORE: 050 100*(100+0) / 100+100 = 50
  2. With
    LOPT=(MAJMOD*20)
    SEARCH: ANNE SMITH FILE: MARY SMITH SCORE: 066 100*(200+0) / 200+100 = 66
  3. With
    LOPT=(MAJMOD*20)
    SEARCH: ANNE SMITH FILE: ANNE BROWN SCORE: 033 100*(100+0) / 200+100 = 33
If using
MAJMOD
, also consider using
EXACTWRD,1
. When using
OPTION MAJMOD
(see below), if
SCALEFTR,1
is also specified then the above scale values should be multiplied by 10, e.g. MAJMOD*200 rather than MAJMOD*20.
OPTION MAJMOD
VALUE LEVEL,{0,1,2,3}
VALUE MOVEMNT,{0,1,2}
VALUE SCORE,{n}
VALUE THRSHOLD,{n}
These options allow
MAJMOD
to be enabled but with finer control. The setting of LEVEL defines the rules which activate
MAJMOD
score modification.
LEVEL,0
. Uses the
MAJMOD
option when the major word in one name matches with any word in the other name.
LEVEL,1
. Uses the
MAJMOD
option when the major words in both the names match and share the same position. The
MAJMOD
option uses the same rules as that of the
LOPT=MAJMOD
option except that the
MAJMOD
option cannot reduce the score.
LEVEL,2
. Uses the
MAJMOD
option when the major word in one name matches with the major word of other name that is in the same or adjacent position.
LEVEL,3
. Uses the
MAJMOD
option when the major words in both the names match and share the same position. The
MAJMOD
option uses the same rules as that of the
LOPT=MAJMOD
option.
VALUE MOVEMNT
. Dictates how
MAJMOD
can affect the score; negatively, positively or both.
MOVEMNT,0
(the default) indicates that
MAJMOD
can increase or decrease the score.
MOVEMNT,1
indicates that
MAJMOD
can increase the score, but not decrease it.
MOVEMNT,2
indicates that
MAJMOD
can decrease the score, but not increase it.
VALUE SCORE
. Word Score that assigns to the major word comparison. With
SCALEFTR,10
it can be set with values of n from 0 to 120. With
SCALEFTR,1
it can be set with values of n from 0 to 1200, In either case, the default is 0.
VALUE THRSHOLD
is the word Score threshold above which to activate this logic. It can have a value of n from 0 to 100, the default is 100. This value is unaffected by the setting of
SCALEFTR
and should always be specified in the range 0-100.
OPTION FLAGS
VALUE SKIPMAJM,{0/1}
By default, an exact match check is done on two names before any other processing. If an exact match is found, the method exits early with a score of 100. In some rare cases, it may be desirable to bypass the exact match check if
MAJMOD
is specified. This is because
MAJMOD
can be used to lower the score if the major words match. To bypass the exact match check when
MAJMOD
is specified, set
SKIPMAJM,1
.
SKIPMAJM,0
(the default) will enable the exact match early-exit.
XOPT=(SKIPMOD*[number])
The
SKIPMOD
option tells the Method to modify its Score if a match was found on a skip word (as flagged by the Formatting routine).
Giving a value for
SKIPMOD
of 10 will scale the Score for matching skip words by 1 (i.e. not scale them at all) and will give the same behaviour as omitting the option. A value of 5 will cause the importance of the skip words, if they match, to be halved in the final calculation of the Score – this has the effect of increasing the importance of the non-Skip words in the name.
For example, if the Edit-list contains the following rules,
*C SS S Skip Word SS LABS > <
Then,
  1. With no
    SKIPMOD
    ,
    SEARCH: JOHN DEERE LABS FILE: JOHN DARIS LABS SCORE: 090
  2. With
    LOPT=(SKIPMOD*5)
    ,
    SEARCH: JOHN DEERE LABS FILE: JOHN DARIS LABS SCORE: 073
OPTION FLAGS
VALUE SKIPSMOD, {0/1}
By default, an exact match check is done on two names before any other processing. If an exact match is found, the method exits early with a score of 100. In some rare cases, it may be desirable to bypass the exact match check if
SKIPMOD
is specified. This is because
SKIPMOD
can be used to lower the score if skip words match. To bypass the exact match check when
SKIPMOD
is specified, set
SKIPSMOD,1
.
SKIPSMOD,0
(the default) will enable the exact match early-exit.
OPTION CATSW
VALUE [Edit-list Category],[Number]
The Name Matching Method by default passes the names to be matched through both the Cleaning and Formatting routines. The Formatting routine, among other things, transforms the name components according to rules in the Edit-list. The Categories of any Edit-list rules which have been applied are then passed back to the Method. The default maximum Score for a word comparison is 10/10. The
CATSW
option is used for ranking purposes to reduce the score of certain words which were originally different but changed via Edit-list rules to match. For example, Nickname Replacement or Secondary Name rules. This will have the effect of reducing the overall score. For example, if a search was done on JOHN BROWN ADVERTISING then both JOHN BROWN MARKETING and JOHN BROWN ENGINEERING would score the same; however, it may be desirable to rank JOHN BROWN MARKETING above JOHN BROWN ENGINEERING.
This can be achieved by first defining the ’similar’ words in the Edit-list in a manner similar to the following,
*C SN I Secondary Lookup Word __ SN MARKETING >ADVERTISING <
Then,
  1. Not using
    CATSW
    SEARCH: JOHN BROWN MARKETING FILE: JOHN BROWN ADVERTISING SCORE: 066
  2. With
    OPTION CATSW
    VALUE SN,90 SEARCH: JOHN BROWN MARKETING FILE: JOHN BROWN ADVERTISING SCORE: 096
  3. With
    OPTION CATSW
    VALUE SN,90 SEARCH: JOHN BROWN MARKETING FILE: JOHN BROWN ENGINEERING SCORE: 066
As Marketing and Engineering do not match via Edit-List rules, the score remains at 66%.
CATSW
can now be used with a Secondary Name Edit-list category (in the fast-starts this is known as
SN
). This is useful when using
CATSW
to degrade the Score of nickname fields for ranking. In N3SCL, only
NN(R)
and
NK(N)
categories can be used.
To reduce the maximum Weight of a word which is defined in an Edit-list category, use either LIMWCAT, 0 with CATSW, or use CATSS.
OPTION CATSS
VALUE [Edit-list Category],[Number]
CATSS
is used to reduce the maximum Weight (significance) of certain words which were originally different but changed via Edit-list rules to match.
LIMWCAT
is always set to 0 for
CATSS
. If a Category is defined for both
CATSW
and
CATSS
,
CATSW
will take precedence.
OPTION FLAGS
VALUE ORIGCSW,{0,1}
Indicates whether to enable the Edit-list category options
CATSW
or
CATSS
for matching original words. Set the value to 1 to enable
CATSW
or
CATSS
for matching original words. Default is 0.
OPTION FLAGS
VALUE CATSWD,{0/1}
By setting
CATSWD
to 1,
CATSW
and
CATSS
processing will be bypassed when an Initial to Word match is being processed and the Word is in a
CATSW
or
CATSS
category.
The "D" stands for "Disable".
OPTION FLAGS
VALUE CATSWEXT,{0/1}
By setting
CATSWEXT
to 1, an exact match before formatting between two words will now score 100 even if a word belonged to one of the categories specified via
CATSW
or
CATSS
.
OPTION FLAGS
VALUE CATSWF,{0/1}
By setting
CATSWF
to 1,
CATSW
and
CATSS
processing will be performed even if
MAJMOD
processing is done.
The "F" stands for "Force".
OPTION FLAGS
VALUE EXACTCAT,{0/1}
By setting
EXACTCAT
to 1, an exact match after formatting between two words will now score 100 even if a word belonged to one of the categories specified via
CATSW
or
CATSS
which specified that the word’s Score should be reduced.
For example, if the Edit-list contains the following rules,
*C CS S Skip Word *C RR R Replacement Word RR DEPT >DEPARTMENT < CS DEPARTMENT > <
Then,
  1. With just
    OPTION CATSW VALUE CS,5 SEARCH: HEALTH DEPT FILE: HEALTH DEPARTMENT SCORE: 075
  2. But with both
    OPTION CATSW VALUE CS,5 OPTION FLAGS VALUE EXACTCAT,1 SEARCH: HEALTH DEPT FILE: HEALTH DEPARTMENT SCORE: 100
For more information, see the
Word Weight Modification
section in N3SCM.
The default is 0.
OPTION FLAGS
VALUE LIMWCAT,{0/1}
The default behavior (
LIMWCAT,1
) does not allow the
CATSW
option to use a maximum Weight less than 10. By setting
LIMWCAT
to 0, the maximum Weight of a word which is in a category defined by the
CATSW
option can be less than 10. For more information, see the
Word Weight Modification
section in N3SCM.
OPTION CATNREP
VALUE nnt,0
CATNREP
allows you to override the meaning of an Edit-list Category Name while performing Matching. "nn" is the Edit-list Category Name, and "t" is the new Category Type. The value 0 is ignored but must be present. Multiple VALUE statements are permissible.
OPTION CATNREP VALUE SND,0 <= change Secondary Names (SN) to Delete VALUE PTS,0 <= change Personal Titles (PT) to Skip
OPTION CATNIGN
VALUE nn,0
CATNIGN
allows you to entirely disable, or ignore, an Edit-list Category Name while performing Matching. "nn" is the Edit-list Category Name. The value 0 is ignored but must be present. Multiple VALUE statements are permissible.
OPTION CATNIGN VALUE NN,0 <= disable the Nickname (NN) category
OPTION CATTREP
VALUE ct,0
CATTREP
allows you to override the meaning of an Edit-list Category Type while performing Matching. "c" is the Edit-list Category Type, and "t" is the new Category Type. The value 0 is ignored but must be present. Multiple VALUE statements are permissible.
OPTION CATTREP VALUE SD,0 <= change all Skip (S) to Delete (D)
OPTION CATTIGN
VALUE t,0
CATTIGN
allows you to entirely disable, or ignore, an Edit-list Category Type while performing Matching. "t" is the Edit-list Category Type. The value 0 is ignored but must be present. Multiple VALUE statements are permissible.
OPTION CATTIGN VALUE R,0 <= ignore all Replace (R) types
OPTION FLAGS
VALUE ENABLDNM, {0/1}
Indicates whether to disable matching the specified pairs of words even though they are similar. Set the value to 1 to disable matching the specified pairs of words. Default is 0.
For example, add the following pairs of words to the Edit-list:
*C NM 7 Do Not Match NM PETER >PETA < NM PETER >PETRA <
When you match the words PETER JONES and PETRA JONES, you get the following results:
  • With
    OPTION FLAGS VALUE ENABLDNM,1
    , you get a score of 50 because SSA-NAME 3 does not match the words, PETER and PETRA.
  • With
    OPTION FLAGS VALUE ENABLDNM,0
    , you get a score of 84.

Local Options Addressing Spelling

If the Name Matching Method cannot achieve a good Score by other means, it will resort to comparing the stabilized forms of two words and/or to performing a string comparison on the two names. The following Local Options control the behavior of this level of matching.
Option
Description
OPTION FLAGS
VALUE SKIPCONS,[number]
Matches multiple consonants with a single consonant. Set the value to 1 to enable this option. Default is 0.
For example, after you set this option to 1, if you match CROSS and CROS, the consonants
SS
matches with the consonant
S
.
OPTION FLAGS
VALUE SKIPVOWL,[number]
Matches vowels or ignores a vowel when compared with a consonant. Set the value to 1 to enable this option. Default is 0.
For example, after you set this option to 1, if you match ABAD and ABED, the vowel
A
matches with the vowel
E
.
OPTION FLAGS
VALUE SYNCS,[number]
When raw string matching is used, two names are compared character by character. If two characters do not match, the method will look ahead for the full length of the name for a character match, and attempt to resynchronize the matching from that character forward. This option tells the method how many characters must match in a look-ahead operation for the re-synchronization to be accepted. The default value is 2.
OPTION FLAGS
VALUE TRANSLEN,[number]
When raw string matching is used, two transposed characters are accepted as a match (2/2) if the word length is greater than or equal to
[number]
. The default is 1. If the word length is less than
[number]
the two transposed characters score 1/2. For example,
OPTION FLAGS VALUE TRANSLEN,5
John Patterson and John Pattesron score 100 John Bent and John Bnet score 075
OPTION SCORES
VALUE STD,[number]
If two stabilized words match, the default word Score given is 7/10. This option allows the word Score to be increased or decreased. For example,
OPTION SCORES VALUE STD,8
will give a word Score of 8/10 for a stabilized word match. If
SCALEFTR,1
is specified,
[number]
can be between 0 and 100.
LOPT=(NOSTD)
This option disables the stabilized matching of two words. With this option set, no stabilized comparisons will take place.
LOPT=(NORAW)
This option disables the raw string matching. With this option set, no raw string comparisons will take place.
OPTION SCORES
VALUE ORIGWTHR,[number] VALUE ORIGWSCR,[number]
If the initial Score for a match is below the
ORIGWTHR
threshold value (a value between 1 and 100),
ORIGWSCR
logic will recalculate the Score on each ’unformatted’ word, i.e. after Cleaning but without Edit-list processing. A raw string comparison will be done on the words. If the result is a Score less than the maximum possible stabilized word score, a comparison is also done on the stabilized form of the words, and the higher of the two scores used. The
ORIGWSCR
value is then used to scale the Score, and the resulting Score will be used if it is greater or equal to the initial Score. For example, if there was an Edit-list rule replacing Nathan with Nathaniel:
Without
ORIGWSCR
(or
ORIGWSCR,0
):
SEARCH: NATHAN FILE: NATHON SCORE: 055
(as we are actually comparing
NATHANIEL
to
NATHON
due to the activation of an Edit-list rule).
With
ORIGWTHR,90 ORIGWSCR,100 SEARCH: NATHAN FILE: NATHON SCORE: 083
(as we are now comparing
NATHAN
to
NATHON
and using the higher Score).
OPTION FLAGS
VALUE MATCHEND,[number]
Allows a string match (raw compare) to resync even at the last character
[number]
defaults to 0. For example, using the defaults for SYNCS and MATCHEND, Tiene vs Tienne scores 6/10.
With
OPTION FLAGS VALUE MATCHEND,1 Tiene vs Tienne scores 8/10
OPTION SCORES
VALUE WBELOW,[Number]
Set the Score for any single word to zero if the raw string word Score is less than a
[number]
where
[number]
can be between 1 and 100. For example, Not using WBELOW:
SEARCH: CECILIA M SMITH FILE: JAQUELINE M SMITH SCORE: 076
With VALUE WBELOW,75
SEARCH: CECILIA M SMITH FILE: JAQUELINE M SMITH SCORE: 066
OPTION FLAGS
VALUE RAWCMPTN,{0/1}
The default setting of 1 causes a raw string compare of a word to be calculated out of 10 and any remainder is dropped (e.g. a score of 8.7/10 will become 8/10 or 80/100). By changing the setting to 0, the raw string compare will be calculated out of 100 (e.g. 87/100 will result in a word score of 087). For example, Without RAWCMPTN (or RAWCMPTN,1):
SEARCH: MICKALSEN FILE: MICKALSON SCORE: 080
With RAWCMPTN,0
SEARCH: MICKALSEN FILE: MICKALSON SCORE: 087
OPTION SCORES
VALUE RAWSTBTH,n
VALUE RAWSTBVL,n
If the score from the raw string compare is greater than RAWSTBTH then improve the score using the following formula:
new score = raw score + ((100 - raw score) * stabilized score / RAWSTBVL)
It increases the score by a factor based on the raw score and stabilized score. The default value for RAWSTBTH is 0, which disables this option.
Examples, for
LOPT=(NORAW)
Words
Opts
Scr
Comment
KAN KON
70
Both the stabilized and raw compare are performed and the highest Score is used. The raw compare Scores 6/10 (2/3 characters match), however the words stabilize to the same and score 7/10 (the default). The Score is therefore 70.
KAN KON
NOSTD
60
Only the raw compare is performed and a Score of 60 is returned (2/3 characters match giving 6/10).
KAN KON
NORAW
70
Only the stabilized compare is performed. Because the two words are the same afterWord Stabilization, they score 7/10 (the default) and a Score of 70 is returned.
KAN KON
NOSTD,
NORAW
00
Using both options forces an exact match comparison on the words after they have been processed through the Edit-list. As the two names are not exactly the same the Score is 0.
ABCDEFGKAN
ABCDEFGKON
90
The raw compare returns a higher value (9/10) than the stabilized compare which defaults to 7/10.
ABCDEFGKAN
ABCDEFGKON
NOSTD
90
Same result as above as the stabilized compare was overridden by the raw compare anyway.
ABCDEFGKAN
ABCDEFGKON
NORAW
70
The stabilized words return an exact match, which defaults to 7/10.

Local Options Addressing Long Names or Addresses

When matching long names or addresses and many of the components match, a high Score will be returned. If this Score is higher than desired, it is possible to have only those components which did not match contribute to the Score. This is achieved via the
CLIMIT
and
CLIMLIST
options.
Option
Description
OPTION CLIMIT
VALUE TRIGS,[number]
VALUE CHARDS,[number]
VALUE NSACTF,{0/1}
VALUE NOINCR,{0/1}
VALUE AVERAGE,[number]
VALUE NOEXPNTY,[number]
VALUE RECREF,{0/1}
CLIMIT
logic is executed if the initial Score for a match is above the TRIGS value (a value between 0 and 100). If the initial Score is 100, however, the
CLIMIT
logic is bypassed.
CLIMIT
logic will recalculate the Score while allowing only those words whose types appear in
CLIMLIST
to participate in this recalculation. Words which scored above a user-defined limit (
CHARDS
) may also be excluded from the recalculation.
CHARDS
can have a value of between 0 and 100.
The
NSACTF
(No Match Action Flag) flag can be set to dictate what Score to return when no new Score is possible. Valid values: 0 and 1. A value of 0 will return a Score of 0 if no new Score is possible. A value of 1 will return the original Score. The default is 0.
The
NOINCR
flag can be set so as to remember the original Score and if the original Score is less than the recalculated Score then the original Score is used. This is useful to allow bad matches to decrease the Score but prevent good matches from increasing the Score.
Using
AVERAGE
and a value of
N
, where
N
defaults to 10, means that the ultimate Score is calculated as:
(Original Score + CLIMIT Score * N/10) * 1/2
This will allow a mix of the effect of the original and the recalculated Scores to be created without having to use two methods on the same field.
It would be quite normal to use
NOINCR
and
AVERAGE
in combination, as well as in combination with all the other options.
NOEXPNTY
will reduce the final
CLIMIT
score by the number of unmatched
CLIMLIST
tokens times the
NOEXPNTY
value.
The
RECREF
flag, when set to 1, will cause re-calculation of the
REFxxx
record (as defined in the
GOPT
parameter) based on the word types in
CLIMLIST
. A value of 1 is recommended. For example, matching the following addresses:
GOPT: REFMIN SEARCH: 3, 94 MILLER ST SYDNEY 2060 FILE: 94 MILLER ST NORTH SYDNEY NSW 2060
The
SEARCH
record is initially selected as the Reference record and a score of 80 returned.
CLIMIT
processing is specified for codes only (shown above in Italics). Using RECREF,0 a Score of 66 is returned, as the original reference record is used. By using RECREF,1, CLIMIT will re-calculate the Reference record based on Codes only, and a Score of 100 is returned as now the
FILE
record is used as the Reference. (See the
AVERAGE
option on how to average the before and after scores.)
OPTION CLIMLIST
VALUE [word-type][word-type]. . . ,0
This option defines the word types to be compared during
CLIMIT
Score recalculation. The list of types is used only by the
CLIMIT
option and has no effect if the
CLIMIT
option is not enabled. If this list is not specified a default list containing categories Y (non-major words), C(odes) and I(initials) are used.
A maximum of 8 word types may be listed.
The following example will force
CLIMIT
logic to recalculate the Score while only using words of type S(kip), M(ajor) and I(nitials).
OPTION CLIMLIST VALUE SMI,0
A value field (0 above) is required but is not actually used.
For a complete list of word types refer to the
APPLICATION REFERENCE guide > NAMESET section
.
EXAMPLE 1 for OPTION CLIMLIST
An application to match addresses does not want to match addresses in the same street, but with different street numbers, too highly, but would still like to consider such matches as suspect. Using the Local Option
EXCTCODE
may be too hard for this purpose, so
CLIMIT
could be used,
  1. With no
    EXCTCODE
    and no
    CLIMIT
    ,
    SEARCH: 56 VALLEY RD NEWTOWN WA 2365 FILE: 56A VALLEY RD NEWTOWN WA 2365 SCORE: 090
  2. With
    XOPT=(EXCTCODE)
    and no
    CLIMIT
    ,
    SEARCH: 56 VALLEY RD NEWTOWN WA 2365 FILE: 56A VALLEY RD NEWTOWN WA 2365 SCORE: 080
  3. With
    XOPT=(EXCTCODE)
    and,
    OPTION CLIMIT VALUE TRIGS,80 OPTION CLIMLIST VALUE C,0 SEARCH: 56 VALLEY RD NEWTOWN WA 2365 FILE: 56A VALLEY RD NEWTOWN WA 2365 SCORE: 085
EXAMPLE 2 for OPTION CLIMLIST
An application to match book titles needs to overcome the problem whereby long titles with many words the same and maybe only one word different achieve too high a Score.
MAJMOD
cannot be used because the position of the major word is unstable,
  1. With no
    CLIMIT
    ,
    SEARCH: ANIMAL STORIES FROM OUTBACK AFRICA: ELEPHANTS FILE: ANIMAL STORIES FROM OUTBACK AFRICA: TIGERS SCORE: 090
  2. With
    CLIMIT
    :
    OPTION CLIMIT VALUE TRIGS,80 VALUE CHARDS,90 OPTION CLIMLIST VALUE YM,0 SEARCH: ANIMAL STORIES FROM OUTBACK AFRICA: ELEPHANTS FILE: ANIMAL STORIES FROM OUTBACK AFRICA: TIGERS SCORE: 010

Local Options Addressing Multi-valued Fields

The N3SCM method can also compare multi-valued fields and return the best Score from within the multiple comparisons. The types of multiple-valued fields supported are:
  • Account names & Compound names
  • Secondary names
  • Repeating fields
Account Names & Compound Names
The method will automatically compare Account and Compound Names if these features are turned on in the Algorithm being used by the method. For a description of Account & Compound names see the
Multi-Valued Fields
section. As an example, if the Compound Name feature is turned on and
T/AS
is defined as a Compound Name Marker and the
REFMAX
Global Option is defined in the Matching Scheme, then:
SEARCH: SNAPPY INVESTMENTS FILE: ABC HOLDINGS T/AS SNAPPY INVESTMENTS SCORE: 100
Secondary Names
To use the Secondary name feature, Secondary name entries must first be defined in the Edit-list being used by the method’s Algorithm. For a description of Secondary name Edit-list entries see the
Multi-Valued Fields
section.
When comparing a word that is defined as a Secondary name, the method can then compare all its replacement values at the same time.
To specify what types of Secondary names should be expanded, the following
Local
Option must be set:
Option
Description
OPTION FLAGS
VALUE SECOND,{0/1/2/3/4/5}
  • 0 – do not expand secondary names (default)
  • 1 – expand only if leftmost minor word (only Y types) 2
  • – expand only if rightmost minor word (only Y types)
  • 3 – expand all words (all types)
  • 4 – expand only minor words (not M or N types)
  • 5 – expand only major words (M or N types) As an example, when matching addresses, this feature can be used to give equality to neighboring localities, provided the
Edit-list is set-up with the appropriate values. For example, assuming NEWTOWN is next to MIDTOWN, and MIDTOWN is next to OLDTOWN, but NEWTOWN is not next to OLDTOWN, then the Edit-list should contain at least the following:
*C SN I Secondary Names SN NEWTOWN >MIDTOWN < SN OLDTOWN >MIDTOWN <
Then, if using:
OPTION FLAGS VALUE SECOND,3 SEARCH: 25 MAIN STREET NEWTOWN FILE: 25 MAIN STREET MIDTOWN SCORE: 100 SEARCH: 25 MAIN STREET NEWTOWN FILE: 25 MAIN STREET OLDTOWN SCORE: 066
OPTION FLAGS VALUE SECPHRSE,{0/1/2}
This option allows you to create secondary phrase names. This matching option is equivalent to NAMESET function keyword
SECPHRASE
or
SECPHRASEALL
. Value 0 turns this feature off, 1 turns the feature on (NAMESET
SECPHRASE
) and 2 creates all secondary names (NAMESET
SECPHRASEALL
). The default is 0.
OPTION FLAGS VALUE SECPORIG,{0/1}
This option allows you to include original names before secondary phrase rules are applied. This matching option is equivalent to NAMESET function keyword
SECPHRASEORIG
. Value 0 turns this feature off and 1 turns the feature on (NAMESET
SECPHRASEORIG
). The default is 0.
OPTION SCORES VALUE SECPHRSE,n
This option allows you to specify the maximum score to apply to secondary phrase matches. Value 0 turns this feature off and will assign a maximum score of 100. The default is 0.
OPTION SCORES OPTION SECCAT
VALUE [Edit-list Category],1
Requires that both Secondary words being matched are in the
[Edit-list Category]
specified. Multiple Edit-list Categories can be specified using multiple
VALUE
statements.
Requires
OPTION FLAGS
,
VALUE SECOND
to be set to non-zero value. For example,
OPTION FLAGS VALUE SECOND,3 OPTION SECCAT VALUE SN,1
will only perform Secondary name matching on words which are in the
SN
Edit-list category.
OPTION SECTYPE
VALUE [Word-type],1
Requires that both Secondary words being matched have the
[Word-type]
specified. Multiple Wordtypes can be specified using multiple
VALUE
statements.
Requires
OPTION FLAGS
,
VALUE SECOND
to be set to non-zero value. For example,
OPTION FLAGS VALUE SECOND,3 OPTION SECTYPE VALUE S,1
will only perform Secondary name matching on words which have a Word-type of
S
(Skip).
OPTION SCORES
VALUE SEC,[Number]
This option allows you to specify the word Score, from 0 to 10, for Secondary Word matches. The default is 10.
Repeating fields
Repeating fields are distinct fields which repeat n times in the search and file records being passed to Matching. Examples might be where you want to match a search name against either a current name or a former name and get the best Score. Another example might be to match a search address against either a residential address or a postal address.
To enable this feature, use the
REPEAT=
option on the
FIELD
keyword as described in the
Definition File Structure
section at the beginning of this chapter. For example,
METHOD NAME=MNAME,WEIGHT=1, X GOPT=(LENGTH*50+REFMIN), X LOPT=(CONC+CINITA+INITLOW) FIELD OFFSET=0,REPEAT=2
This tells the N3SCM Method to expect two (2) fields of length 50 in both the search record and file record. For example,
Search: John Smith File: Mike Taylor John Smith
would return a Score of 100, due to the match of the search name John Smith against the second of the two fields in the file record. You could also put two names in the search record and the method would try up to four matches before returning with a Score, although if any one of those matches scores 100, it returns early.
To modify this behavior the following Local Options may be used.
Option
Description
OPTION FLAGS
VALUE WORSTSCR,{0/1/2}
0 - return the highest score detected (default) 1 - return the worst "best" score detected. 2 - return the worst "worst" score detected.
In the descriptions below, the search and file records are assumed to contain the following:
The search record contains 3 fields in a repeating group:
John Smith Mary Smith Peter Smith
And the file record contains 4 fields in a repeating group:
John Smith Mary Smith Peter Smith Paul Jones
WORSTSCR=0
Each field in the search record is compared to every field in the file record, always remembering the highest score. This process continues until either all possible comparisons have been performed or a score of 100 is detected. The highest score is then returned.
Using the sample data, John Smith will score 100 and this score will be returned. No further comparisons will be necessary.
WORSTSCR=1
Each field in the search record is compared to every field in the file record, remembering the highest score calculated for each search field. Then, the lowest remembered score is returned.
Using the sample data, each of the fields in the search record will score 100. The lowest of these scores, 100, will then be returned.
WORSTSCR=2
Each field in the search record is compared to every field in the file record, remembering the lowest score calculated for each search field. Then, the lowest remembered score is returned.
Using the sample data, each of the fields in the search record will score 0. The lowest of these scores, 0, will then be returned.
OPTION FLAGS
VALUE WSCRNOEX,[number]
This option only has an effect when
WORSTSCR
is set to 1 or 2. It is used to penalize the score in the case where the number of repeats differs between the search and file records. The score is reduced by this value for each extra field present. So, if
WSCRNOEX
is set to 3 and using the sample data above, the score will be reduced by 3 (3 * (4 - 3)). If specified, it should be in the range 1-10. The default is 0, which means that no reduction in score occurs.

Local Options Controlling Word Score

Option
Description
OPTION FLAGS
VALUE USECWAIT,{0/1}
The default behavior of N3SCM does not allow the maximum Score of a word pair to be less than 10.
This affects the word weight modifying options
MAJMOD
,
SKIPMOD
and
CATSW
. By setting USECWAIT to 1, the maximum Score of a word pair is allowed to be less than 10. For more information, see the
Word Weight Modification
section in N3SCM.
OPTION FLAGS
VALUE SCALEFTR,{1/10}
The default behavior of N3SCM scores a word pair out of 10. By setting SCALEFTR to 1, it will score the word pair out of 100, providing a finer calculation. This mostly affects ranking options (such as
CATSW
), which can now be set out of 100 instead of 10, allowing a smaller reduction in score.
For example, using the default SCALEFTR,10 the following can be specified:
OPTION CATSW VALUE NN,9
Then when matching the following two names, assuming Rick is defined in the Edit-list with a category of NN:
RICK THOMSON RICHARD THOMSON
will score 95.
When using the SCALEFTR,1 (i.e. word score out of 100) the following can be specified:
OPTION CATSW VALUE NN,95
and
RICK THOMSON RICHARD THOMSON
will now score 97.

Local Options Controlling Reference Record Matching

Option
Description
OPTION NOEXCESS
VALUE TRIGGER,[trigger Score]
VALUE PENALTY,[penalty Score]
VALUE REFCNT,[maximum word count]
VALUE REFMULT,[penalty Score multiplier]
VALUE SREFCNT,[maximum skip word count]
VALUE SREFMULT,[penalty Score multiplier]
VALUE REFF,{0/1}
VALUE REFS,{0/1}
When the Global Option REFMIN is specified (use the shorter record as the reference record – see the
Global Options
section for more details),
NOEXCESS
can be used to decrement the method Score by the number of non-matching words in the non-reference (longer) record. The method Score must be equal to or greater than
[trigger Score]
for this option to take effect. When
NOEXCESS
is activated, the method Score is decremented by
[penalty Score]
for each non-matching word in the non-reference record.
For example, with
GOPT=(REFMIN) SEARCH: JOHN PEEL FILE: KEN JOHN PEEL SCORE: 100
with,
GOPT=(REFMAX) SEARCH: JOHN PEEL FILE: KEN JOHN PEEL SCORE: 066
with,
GOPT=(REFMIN) OPTION NOEXCESS VALUE TRIGGER,95 VALUE PENALTY,5 SEARCH: JOHN PEEL FILE: KEN JOHN PEEL SCORE: 095
An additional penalty can be applied if the number of words present in the
REFMIN
record’s Wordsstack is equal to the number defined by
REFCNT
. The default for
REFCNT
is 1. The
REFCNT
syntax allows any number of words to be specified; however this behavior was originally designed for cases when only one word was present in the
REFMIN
name. For example:
10 VALLEY RD 10
REFCNT
is used in conjunction with
REFS
&
REFF
to determine if the additional penalty is to be applied. If it is to be applied, the value of
REFMULT
is multiplied by the penalty Score and the method Score is decremented further by the resulting value.
Specifying
REFS
(the default) causes the logic to only check for the
REFCNT
condition if the
REFMIN
record is the Search record. Specifying
REFF
causes the logic to only check for the
REFCNT
condition if the
REFMIN
record is the File record. Specify both if either Search or File record can be checked.
For example, with,
GOPT=(REFMIN) OPTION NOEXCESS VALUE TRIGGER,95 VALUE PENALTY,5 SEARCH: JOHN FILE: KEN JOHN PEEL SCORE: 090
with,
GOPT=(REFMIN) OPTION NOEXCESS VALUE TRIGGER,95 VALUE PENALTY,5 VALUE REFCNT,1 VALUE REFMULT,2 VALUE REFS,1 SEARCH: JOHN FILE: KEN JOHN PEEL SCORE: 080
This logic in effect says, if the
REFMIN
record is the Search record, and it contains only one word, subtract an additional penalty from the Score equal to
REFMULT x PENALTY
.
SREFCNT
and
SREFMULT
act similarly, however, with the added restriction that the word must be a Skip word.
OPTION NOEXCLSTVALUE [Word-type],0
If the number of words in a name changes due to the action of the
CONC
option then
NOEXCESS
will not degrade the Score. It may be desirable to have
NOEXCESS
to count the number of words in a name after any concatenation has occurred. Switching on
NOEXCLST
to re-Score particular Word-types will allow this. For example, with
GOPT=(REFMIN) OPTION NOEXCESS VALUE TRIGGER,95 VALUE PENALTY,5 SEARCH: MOHAMMED SAID FILE: SA-ID SCORE: 100
But adding:
OPTION NOEXCLST VALUE YM,0 SEARCH: MOHAMMED SAID FILE: SA-ID SCORE: 095
OPTION REFN
VALUE WORDS,[number of words]
VALUE GOOD,[word Score]
VALUE SCORE,[method Score]
VALUE USECATSW [0|1]
Returns a user-defined Score
[method Score]
when a specified number of non-initial words
[number of words]
match with a Score of at least
n/10 [word Score]
. This is irrespective of how other data in the name may have or may have not matched.
The default matching behavior (and the way N3SCM works) is that every token (word or initial) in the reference record will contribute to the method Score by how well it matched. This option allows a method Score to be determined based on the matching of only certain number of tokens from the reference record.
One use for this option is when the need is to confirm a match of two names which have already been identified as having the same id-number (e.g. same social security number).
For example, if the following values were defined in the matching scheme,
OPTION REFN VALUE WORDS,2 VALUE GOOD,8 VALUE SCORE,95
and a search on id-number returned the following two names,
JOHN MICHAEL THOMPSON JOHN CHRISTOPHER THOMPSON
the Score returned by the method would be 95, based on the matching of the two words JOHN and THOMPSON.
You can use the USECATSW option to enable the CATSW option. The value 1 indicates to enable the CATSW option and the value 0 indicates to disable the CATSW option. Default is 0.

Local Options Addressing Matching of Codes

Addressing Matching of Codes
OPTION CODESCOR VALUE CODEWGHT,<codeweight> VALUE CODEUDIF,25 VALUE CODEUNON,90 VALUE CODEUONE,50 VALUE CODEMAXD,6 VALUE CODEPOSS,1 OPTION SORTSCOR VALUE CLN,<clnweight> VALUE FMT,<fmtweight> VALUE SORTWGHT,<sortweight> VALUE NGRAMC,<weight> VALUE NGRAMCLV,<level> VALUE NGRAMF,<weight> VALUE NGRAMFLV,<level>
If
CODESCOR
has been specified and
CODEWGHT
is not zero, then the method N3SCM performs additional code scoring for any codes detected in the search or file name word stacks. This happens after all the usual exact match checking and other score calculations.
  • CODEWGHT
    must be set to 0 - 100 (0 means
    CODESCOR
    is disabled and 100 means that the other scoring calculations will be ignored. Default value is 0.
  • Both
    CODESCOR
    and
    EXCTCODE
    cannot be used at the same time. The method will exit with error if these conflicting options have been specified.
  • CODESCOR
    does not make sense if
    FORMATTING-OPTIONS #1
    is set to ’D’ (delete codes). The method will exit with error if these conflicting options have been specified.
  • The sum of the specified
    CODEWGHT
    and
    SORTWGHT
    must not exceed 100.
  • If
    SORTWGHT > 0
    then calculate
    SORTSCOR
    score.
  • If
    CODEWGHT > 0
    and if either search or file name contains one or more codes, then calculate
    CODESCOR
    score.
If either search or file name contains one or more codes, then weight the result as.
Normal Score Weight = 100 - (Sort Score Weight + Code Score Weight) Final Score = ((Sort Score * Sort Score Weight) + (Code Score * Code Score Weight) + (Normal Score * Normal Score Weight)) / 100)
If there weren’t any codes in the search and file names, then the specified
CODESCOR
weight is ignored and the final score is calculated as
Normal Score Weight = 100 - Sort Score Weight Final Score = ((sortScore * Sort Score Weight) + (normalScore * Normal Score Weight)) / (100 - Code Score Weight));

Code score calculation

The method counts the number of codes in the formatted search and file name stacks. If none are detected then
CODESCOR
is ignored and the final score is calculated as shown above.
If one stack contains codes but the other one doesn’t, then code score is calculated with this formula.
Code Score = 100 - (100 * Number of Codes in Stack / Number of Entries in the Stack That Has Codes) if (Number of Entries in the Stack That Does Not Have Codes < Number of Entries in the Stack That Has Codes) then Weight = 100 * Number of Entries in the Stack That Does Not Have Codes / Number of Entries in the Stack That Has Codes Code Score = Code Score * Weight / 100 endif
This means that the larger the proportion of codes in the other stack, the smaller the score. For example: 9 codes in 10 entries produces a score of 10 but 4 codes in 10 entries produces a score of 60, and then this score is further reduced if the stack with no codes has less entries than the stack that has codes.
Otherwise, the code scoring process is as follows:
Each code in search name stack is compared against each code in file name stack. For each code pair, the method checks if the code begins with digits. If not, then codes are compared as strings.

Comparing codes that consist of or begin with digits

The leading digits in the code is converted to a number. The remaining part of the code is assumed to be a unit of measurement. If the code consists only of a number then the next stack entry (if any) is checked whether or not it contains a known unit of measure. The program currently recognizes the following units (note that the comparison is done in uppercase, because at the time when the comparison is performed, cleaning has converted the input strings to upper case):
"G " grams "L " litres "M " metres "KG " kilograms "MG " milligrams "KL " kilolitres "ML " millilitres "CL " centilitres "DL " desilitres "MM " millimetres "CM " centimetres
If a unit with a kilo, milli, centi or desi prefix is found the number is converted to the base unit, eg. "1 km" becomes "1000 m". After this optional conversion the numbers are compared and an intermediate code score is calculated. If the numbers are equal then the intermediate code score is 100. Otherwise the intermediate code score is calculated as follows:
Intermediate Score = 100 * Smaller Number / Larger Number
For example 400 scored against 600 produces intermediate score 66 (any remainder is dropped).
The intermediate score is then weighed as follows:
  • if both numbers are followed by a unit of measure, then:
    • if the units match exactly, then no intermediate score weighting is done
      otherwise weight the intermediate score with 25%; eg. "400 oranges" scored against "600 apples" produces score of 25 * 66 / 100 = 16.
  • if only one number is followed by a unit of measure, then:
    • weight the intermediate score with 50%; eg. "400 oranges" scored against "600" produces score of
      50 * 66 / 100 = 33.
  • if neither number is followed by a unit of measure, then:
    • weight the intermediate score with 90%; eg. "400" scored against "600" produces score of 90 * 66 /
      100 = 59.
The above weightings can be controlled by defining the following options in the population:
VALUE CODEUDIF,25 VALUE CODEUNON,90 VALUE CODEUONE,50
where
CODEUDIF
(default value 25) represents the weight given for code comparisons where units differ,
CODEUNON
(default value 90) represents the weight given for code comparisons where neither code is followed by a recognized unit and
CODEUONE
(default value 50) represents the weight given for code comparisons where only one code of the pair is followed by a recognized unit.

Comparing codes that consist of (or begin with) large numbers

In some circumstances it makes sense to match even very large numbers in the way described above, ie. calculate their difference as a percentage. However, long numbers commonly represent codes where the position of each digit is meaningful. For example the first three digits could indicate model number, the next three digits size, and the next three digits color. It does not make sense to calculate difference between such numbers as a percentage. Performing a position sensitive string comparison between such codes could make more sense. This behavior can be controlled with the parameters
CODEMAXD
("max digits") and
CODEPOSS
("position sensitive").
VALUE CODEMAXD,6
CODEMAXD
(default value 6) specifies the maximum number of digits in a code for which a numeric comparison is allowed to take place. When the number of digits exceeds this limit, a string comparison is performed instead. If a numeric comparison should always take place then set this value to a large value (at least 24 which is the maximum size of an entry in the formatted word stack and therefore the maximum length of a code).
VALUE CODEPOSS,1
CODEPOSS
(default value 1) specifies that this string comparison is position sensitive, eg. the first digit in the first code is matched only against the first position in the second code etc. Setting this value to 0 specifies that synchronization is allowed to take place. This synchronization follows the same rules as described in the section Comparing codes that do not begin with digits below. Other values are reserved for future use.
The position sensitive score is calculated as follows:
  • First count matching characters in codes until the end of one code is reached.
  • If the lengths of the codes are equal, then the score is calculated as follows:
    score = Number of Matching Characters * 100 / Total Number of Characters in the Code
  • If the lengths of the codes differ, then the score is weighted down relative to the difference between the code lengths. The formula is as follows:
    score = (Number of Matching Characters * 100 * Length of Shorter Code) / (Length of Longer Code * Length of Longer Code)

Comparing codes that do not begin with digits

Both strings are compared one character at the time. Matching characters are counted. When a nonmatching character is encountered, then both strings are examined for the next matching character. The comparison is synchronised at the earliest occurring match.
Finally a score is calculated as follows:
if (reflen <= Number of Matching Characters) then score = 100 else score = 100 * Number of Matching Characters / reflen endif
If
REFMIN
has been specified, then reflen is the length of the shorter code. Otherwise reflen is the length of the longer code.
OPTION SORTSCOR
When
SORTSCOR
has been specified, the method tokenizes the search and file names, sorts these tokens and performs additional scoring on these sorted tokens. This happens after all the usual exact match checking etc. usually performed by N3SCM. The logic for sorted scoring is explained below.

Scoring Cleaned Strings ("CLN Score")

VALUE CLN,<clnweight>
If
CLN
has been set to a value greater than 0, each cleaned input name is sorted in byte order ignoring any spaces.
UNICODE support has not been implemented). For example, if the search name is
"AC/DC "
and the file name is
"EDC...BA "
then these names are first cleaned and the resulting names are sorted, resulting in
"ACCD"
(assuming that the used cleaning routine gets rid of the ’/’)
and
"ABCDE"
(assuming that the used cleaning routine gets rid of the ’...’)
The resulting pair is compared byte by byte in a synchronised manner. In the above example the following comparisons happen:
A vs A
--> match, skip to the next character in each input string
C vs B
-->does not match, skip to the next character in the second string (as
C > B
)
C vs C
-->match, skip to the next character in each input string
C vs D
-->does not match, skip to the next character in the first string (as
C < D
)
D vs D
-->match, skip to the next character in each input string
at this point the end of the first string is detected (spaces are ignored) and the matching stops. 3 matches were counted.
Calculate the sorted score for the cleaned string as follows:
(number of matching characters) * 100 / (Length of reference string)
if
REFMAX
has been specified, then use the longer string as reference. In the above example length of
ABCDE
(5), resulting to,
3 * 100 / 5 = 60
if
REFMIN
has been specified, then use the shorter string as reference. In the above example length of
ACCD
(4), resulting to,
3 * 100 / 4 = 75

Scoring Formatted Word Stacks ("FMT Score")

VALUE FMT,<fmtweight>
If
FMT
has been specified (greater than 0), the formatted word stacks are sorted. For example, if the search name word stack is.
"01 DC " "02 AC "
and the file name word stack is
"01 BC " "02 AC " "03 DC "
then these stacks are first sorted, results being
"AC " "DC "
and
"AC " "BC " "DC "
Then the sorted stacks are compared in a synchronised manner. In the above example the following comparisons happen:
AC vs AC -->
match, skip to the next entry in each stack
DC vs BC -->
does not match, skip to the next entry in the second stack (as
DC > BC
)
DC vs DC -->
match, skip to the next entry in each stack
at this point the end of the first stack is detected and the matching stops. 2 matches were counted.
Calculate the sorted score for the formatted stacks as follows:
(Number of Matching Stack Entries) * 100 / (Total Number of Entries in the Reference Stack)
if
REFMAX
, then use the larger stack as reference. In the above example the larger stack has 3 entries, resulting to,
2 * 100 / 3 = 66
if
REFMIN
, then use the smaller stack as reference. In the above example the smaller stack has 2 entries, resulting to,
2 * 100 / 2 = 100

Combining CLN And FMT Scores

The values specified for
CLN
and
FMT
are used as weights to combine the above scores. The sum of
CLN (<clnweight>)
and
FMT (<fmtweight>)
must be 100, or an error is issued. The following formula is used:
Score = ((CLN Score * <clnweight>) + (FMT Score * <fmtweight>)) / 100)
(Remainder – if any – is dropped.)
For example, using the following values (numbers from the
REFMIN
example above)
CLN
Score was 75 and
VALUE CLN,75
and
FMT
Score was 100 and
VALUE FMT,25
Score = ((75 * 75) + (100 * 25)) / 100
which results to 81.
Using the numbers from the
REFMAX
example above with the same weights:
Score = ((60 * 75) + (66 * 25)) / 100
which results to 61.
If
<clnweight>
is 0 (in which case
<fmtweight>
must be 100), then the
CLN
phase is skipped and the total score is same as FMT Score, or vice versa: If
<fmtweight>
is 0 (in which case
<clnweight>
must be 100), then the
FMT
Score phase is skipped and the total score is same as
CLN
Score.

Additional Scoring (if any)

VALUE SORTWGHT,<sortweight>
Finally the
SORTWGHT
is processed. The value of
<sortweight>
must be a number between 1 to 100 or an error is issued. If 100 is specified, then the above calculated combined
CLN
and
FMT
Score is the final result. But if
<sortweight>
is less than 100, then all usual score calculation done by N3SCM is performed ("normal score"). After this, the scores are weighted using the following formula:
Final Score = ((Combined Sort Score * Sort Weight) + (Normal Score * (100 - Sort Weight))) / 100
(Remainder – if any – is dropped.)
For example, if the combined sort score was 81 (the above
REFMIN
example),
<sortweight>
was 35 and the normal score was 90, then the final score would be calculated as follows:
Final Score = ((81 * 35) + (90 * (100 - 35))) / 100
which results to 86.
NGRAM Scoring
OPTION SORTSCOR also recognises the following parameters:
VALUE NGRAMC,<weight> VALUE NGRAMCLV,<level> VALUE NGRAMF,<weight> VALUE NGRAMFLV,<level>
NGRAM scoring for cleaned input strings ("NGRAMC Score")
NGRAM
scoring for cleaned input strings is controlled with parameters
NGRAMC,<weight>
and
NGRAMCLV,<level>
(both or neither must be specified).
The cleaned input strings are split into tokens, the length of which is specified with the parameter NGRAMCLV,<level> These tokens are scored as explained in the
OPTION SORTSCOR,VALUE CLN
section above, with the exception that the length of each token is not 1, but is specified with the
<level>
parameter.
If the
<level>
parameter is 2, then 2-grams are scored, if
<level>
is 3, then 3-grams are scored etc. The achieved
NGRAMC
Score is weighted with the value specified with the
NGRAMC,<weight>
parameter.

NGRAM scoring for formatted word stacks ("NGRAMF Score")

NGRAM scoring for formatted word stack entries is controlled with parameters
NGRAMF,<weight>
and
NGRAMFLV,<level>
(both or neither must be specified).
Each formatted word stack is combined into a single string. Subsequently this string is split into tokens, the length of which is specified with the parameter
NGRAMFLV,<level>
. These tokens are scored as explained in the
OPTION SORTSCOR,VALUE FMT
section above, with the exception that the length of each token is not 1, but is specified with the
<level>
parameter.
If the
<level>
parameter is 2, then 2-grams are scored, if
<level>
is 3, then 3-grams are scored etc. The achieved NGRAMF Score is weighted with the value specified with the
NGRAMF,<weight>
parameter.
The total of the weight parameter values, ie.
VALUE CLN,<clnweight> VALUE FMT,<fmtweight> VALUE NGRAMC,<ngramcweight> VALUE NGRAMF,<ngramfweight>
must be 100.
The total SORT Score is calculated as follows.
Score = ((CLN Score * <clnweight>) + FMT Score * <fmtweight>) + (NGRAMC Score * <ngramCweight>) + (NGRAMF Score * <ngramFweight>)) / 100)
(Remainder – if any – is dropped.)
The final score is then calculated as explained in the Additional scoring section above.
N3SCM now calculates the standard score first, followed by sorted score and then
NGRAM
score. Finally these three scores are combined using weighting given in the option parameters. The main difference here is that the standard score calculation has some rules that evaluate whether or not the current multivalued field pairs should be scored against each other at all. If the answer is "no" then the sorted and and ngram scoring steps are also skipped for the current field pair.

N3SCM Default Options

Where defaults apply to method options in N3SCM, following are the default values.
OPTION CLIMIT VALUE CHARDS,100 VALUE NSACTF,0 VALUE TRIGS,100 VALUE NOINCR,0 VALUE AVERAGE,0 VALUE NOEXPNTY,0 OPTION CONCAT VALUE PLURALS,0 OPTION FLAGS VALUE ENABLDNM,0 VALUE EXACTCAT,0 VALUE EXACTINI,1 VALUE EXACTWRD,1 VALUE INITCODE,1 VALUE LIMWCAT,1 VALUE MATCHEND,1 VALUE OPTIMILW,1 VALUE ORIGCSW,0 VALUE SECOND,0 VALUE SYNCS,2 VALUE TRANSLEN,1 VALUE USECWAIT,0 VALUE SKIPMAJM,0 VALUE SKIPSMOD,0 VALUE SCALEFTR,10 VALUE CATSWEXT,0 VALUE CATSWD,0 VALUE CATSWF,0 OPTION NOEXCESS VALUE PENALTY,1 VALUE TRIGGER,101 VALUE REFCNT,1 VALUE REFMULT,0 VALUE REFS,1 VALUE REFF,0 VALUE SREFCNT,1 VALUE SREFMULT,0 OPTION ORDER (ORDER options are disabled by default) VALUE POS,9999 VALUE SEQ,9999 VALUE TRIGGER,9999 OPTION REFN VALUE GOOD,9 VALUE SCORE,99 VALUE WORDS,0 OPTION SCORES VALUE ILOWTRIG,10 VALUE SEC,10 OPTION CODESCOR VALUE CODEWGHT,0 OPTION SORTSCOR VALUE SORTWGHT,0

0 COMMENTS

We’d like to hear from you!