Table of Contents

Search

  1. Preface
  2. Introduction
  3. Definition File Overview
  4. Customization Steps
  5. Service Group Definition
  6. Algorithm Definition
  7. Edit-list Definition
  8. Matching Scheme Definition

Service Group Definition and Customization Guide

Service Group Definition and Customization Guide

Customset

Customset

This section provides information on Customset.

The Need for Custom Ranges

In some cases, the standard search ranges and probes generated by the various
NAMESET
Function keywords, may not be totally appropriate for certain data or search requirements.
To cater for special cases, user-customized search ranges can be defined. These are called ’Customset’ ranges.
Customset ranges require special definitions to be put into the Algorithm at customization time and require a special NAMESET Function keyword to be supplied at search time.
From an application programmer’s point of view, Customset ranges are identified by a ’
P
’ in the ’Set- Id’ column of the Search-table parameter (see the NAMESET chaper in the
APPLICATION REFERENCE FOR SSA-NAME3 SERVICE GROUPS
guide for information on Search-table) and are generated at the start of the Search-table.
For more information on how to cause Customset ranges to be put into the Search-table, see the
CUSTOMSET=
keyword in the
NAMESET Function Keywords
section.
One example of the use of Customset in a positive search is when searching a database of person names where there is a mixture of records with full given names and initials. In this case, Customset probes or ranges can be set up to generate a more appropriate search strategy than the standard Positive search in fact, if no explicit Customset ranges are defined, there is a default set of probes set up to address just this problem. This default set of probes can be invoked by using the
NAMESET function
keyword
CUSTOMSET=DEFAULT
. This default set of probes is designed to give quick access to the most likely set of person name candidates where both given names and initials are used.
For example, using the
CUSTOMSET=DEFAULT
function keyword to search for,
JOHN ALEXANDER SMITH
will generate the following default ’P’ probes at the start of the Positive Search-table:
SMITH, JOHN A * SMITH, JOHN ! SMITH, J ALEXANDER * SMITH, J A * SMITH, J ! SMITH, ALEXANDER! SMITH, A ! SMITH!
The "!" terminology means ’and nothing else’. For example, the ’SMITH’ probe will only find those names which have a single word which is like SMITH.
If a positive Search-table alone were used, names such as
J SMITH
would have only been picked up at a wide WI* (word + initial range) level. The benefit of the custom ranges in this case is that
J SMITH
would be picked up much higher up in the Search-table.
Following is an alternative set of custom ranges for the above example. To get this result, Customset ranges must be explicitly defined in the Algorithm definition, in a
CUSTOMSET-DEFINITION=PERSON
section, and then explicitly requested by the NAMESET function keyword
CUSTOMSET=PERSON
.
SMITH, JOHN ALEXANDER * SMITH JOHN ! SMITH J! A! SMITH J!

The Customset Definition

Each Algorithm may optionally contain up to three Customset Definition sections, allowing three different customized search strategies to be defined for an Algorithm. If defined, Customset Definitions must follow the optional
FUNCTION-DEFINITIONS
and precede the optional
ACCOUNT-RULES-DEFINITION
.
Customset Definitions are used to tell the NAMESET Service how to build customized ranges and probes. These special ranges and probes are only built if the
NAMESET CUSTOMSET=
function keyword is passed by the calling program.
If no Customset definitions are defined in the Algorithm, and either the
CUSTOMSET=1
or
CUSTOMSET=2
function keywords are passed by the application, no extra ranges will be generated.
If no Customset definitions are defined in the Algorithm, and the
CUSTOMSET=PERSON
function keyword is passed, a default set of definitions are used to generate the ranges (see previous and next sections).
Customset definitions in the Algorithm start with the
CUSTOMSET-DEFINITION=PERSON
,
CUSTOMSET-DEFINITION=1
or
CUSTOMSET-DEFINITION=2
labels. Each label can be followed by up to 99 Customset ’patterns’. Each pattern specifies a set of ’rules’ for building search ranges or probes.
The
CUSTOMSET-DEFINITION=PERSON
definitions respond to the
NAMESET CUSTOMSET=PERSON
function keyword. The
CUSTOMSET-DEFINITION=1
definitions respond to the
NAMESET CUSTOMSET=1
function keyword. The
CUSTOMSET-DEFINITION=2
definitions respond to the
NAMESET CUSTOMSET=2
function keyword.
The processing of Customset patterns and rules occurs as follows. During NAMESET processing, the Words-stack is built. A Customset pattern is selected by looking for the longest pattern which matches the word-stack contents. Having chosen a pattern, the rules associated with it are used to build ranges or probes.
Each Customset pattern may specify up to 50 rules.
PATTERN=. . . RULE=. . . RULE=. . . . . .
The syntax for a pattern is as follows:
PATTERN=<W|I>...
The
PATTERN
keyword is followed by a list of
W
s an d
I
s, which represent words and initials in the word stack. The pattern must be ordered such that the major word appears first, followed by the minors in left to right order (minor selection for Customset is independent of the setting of
SSA-NAME3-OPTIONS #19
and
#20
).
For example, when using names with family names on the right (
NAME-FORMAT=R
) the name "J R Bloggs" would generate a word-stack with Bloggs as the major with J and R being minors. The pattern matching this name would be
PATTERN=WII
.
The syntax for rules is as follows:
RULE=<Wn|In|Xn|On>, ..., <PROBE|RANGE|IRANGE>
A
RULE
consists of up to eight Pattern indexes (
Wn
,
In
,
Xn
, or
On
), followed by a keyword which describes what sort of range to build. The numeric digit in the pattern index is used to refer to a particular word in the pattern for this rule (starting from 1).
The following alphabets in the Pattern indexes indicate how to process the words for building custom ranges and probes:
  • W
    . Indicates to stabilize the word and use it.
  • I
    . Indicates to stabilize the word and use the initial from the stabilized word.
  • X
    . Indicates to use the initial from the original word.
  • O
    . Indicates to use the original word.
For example, if you use the default USA algorithm, the cleaned, formatted, and stabilized form of the input name
PHILIP SMITH
is
FALAP SNAT
. In this case,
RULE=W1,I2,RANGE would generate SNAT F * RULE=W1,X2,RANGE would generate SNAT P * RULE=W1,O2,RANGE would generate SNAT PHILIP * RULE=W2,W1,RANGE would generate FALAP SNAT *
Concatenated-word ranges can also be defined, allowing Customset more flexibility than negative search strategies. Any two words can be concatenated, however the concatenated words can only be used in the major position.
Syntax is example,
RULE=W3+W1,W2,RANGE
where
W3+W1
indicates the words to be concatenated.
For example, when using names with family names on the right (
NAME-FORMAT=R
) the name "Paul Taylor Smith" would generate a word-stack with Smith as the major (ie
W1
), and Paul (
W2
) and Taylor (
W3
) being minors. Based on the example
RULE=W3+W1,W2,RANGE,
the following Customset range would be generated:
TAYLORSMITH, PAUL *
The Search-table element can be a
PROBE
, a
RANGE
or a range based upon an initial, an
IRANGE
.
If a particular pattern which has been discovered in the Words-stack is not defined in the Customset definitions, the next shortest pattern which matches it will be used. For example, using the default Customset definitions (shown below), if a pattern of WWWW is discovered in the Words-stack, the rules for the WWW pattern will be used. This behavior can be turned off by setting
SSA-NAME3-OPTIONS #26
.

Default Customset Definitions

If the
CUSTOMSET=DEFAULT
function is passed by an application calling NAMESET, or the
CUSTOMSET=PERSON
function is used and there is no corresponding
CUSTOMSET-DEFINITION=PERSON
defined in the Algorithm, the following default Customset patterns are used,
PATTERN=WWW RULE=W1,W2,I3,RANGE RULE=W1,W2,PROBE RULE=W1,I2,W3,RANGE RULE=W1,I2,I3,RANGE RULE=W1,I2,PROBE RULE=W1,W3,PROBE RULE=W1,I3,PROBE RULE=W1,PROBE PATTERN=WWI RULE=W1,W2,PROBE RULE=W1,I2,I3,RANGE RULE=W1,I2,PROBE RULE=W1,I3,PROBE RULE=W1,PROBE PATTERN=WIW RULE=W1,I2,RANGE RULE=W1,W3,PROBE RULE=W1,I3,PROBE RULE=W1,PROBE PATTERN=WII RULE=W1,I2,PROBE RULE=W1,I3,PROBE RULE=W1,PROBE PATTERN=WW RULE=W1,I2,RANGE RULE=W1,PROBE PATTERN=WI RULE=W1,PROBE PATTERN=W RULE=W1,PROBE PATTERN=I RULE=I1,PROBE
Using the default rules, the example name
KELLY, PAUL EDWARD
will use the Customset pattern:
PATTERN=WWW RULE=W1,W2,I3,RANGE RULE=W1,W2,PROBE RULE=W1,I2,W3,RANGE RULE=W1,I2,I3,RANGE RULE=W1,I2,PROBE RULE=W1,W3,PROBE RULE=W1,I3,PROBE RULE=W1,PROBE
which will generate Customset ranges:
KELLY, PAUL E * KELLY, PAUL ! KELLY, P EDWARD * KELLY, P E * KELLY, P! KELLY, EDWARD! KELLY, E! KELLY!
The
!
character means ’probe’,
*
means range.

0 COMMENTS

We’d like to hear from you!