Service Group Application Reference

Service Group Application Reference

Parameters

Parameters

The application program Calls the
NAMESET
Service with the following parameters:
No.
Name
Size (bytes)
Filled in by
1
Service name
8/32
Application
2
Response code
20
NAMESET
3
Function
32, or 2 – 1024
Application
4
Name in
As defined in Algorithm
Application
5
Cleaned Name
Same as Name in
NAMESET
6
Word Stack
258 (by default)
NAMESET
7
Keys Stack
142 (by default)
NAMESET
8
Search Table
677 (by default)
NAMESET
9
Categories
20
NAMESET
10
Work-area
100,000 (minimum)
NAMESET
10 Work-area 100,000 (minimum) NAMESET
The name of the Service for the
NAMESET
Service type as it has been defined in the Algorithm Definition. The name will be either 8 bytes if fixed in length, or up to 32 bytes if variable in length. Refer to the person responsible for defining and customizing the Algorithms.
RESPONSE-CODE (20 bytes)
This parameter is filled in by the Service to indicate the success or otherwise of the Call, a value of
00
in the first two positions indicates that all was well, any other value flags a warning or an error. For a description on how to check Response Codes, turn to the
How an Application Should Test the Response Code
section.
FUNCTION (32, or 2-1024 bytes)
The
NAMESET
Function is used to control the type of key building or search strategy returned by
NAMESET
to the calling program. The Keys are returned in the Keys-stack parameter. The search strategy is returned as an array of name key search ranges in the Search-table parameter.
Different applications within one organization will typically have different key building and search requirements and will therefore normally use different
NAMESET
Functions. The Function to use needs to be thought about and understood by the application designer because different Functions have different effects on the search performance and quality, and produce search tables which require different processing.
The different types of Keys available are Preferred, Positive or Negative.
The three types of Search Strategies supported are Positive, Negative & Custom. Each can be tailored to provide different emphasis on performance and quality. In all cases, special search ranges called Probes and Secondary Name lookup can also be requested. Following is a description of each of these types of search ranges:
  • Positive search ranges (otherwise referred to as cascade search ranges) - these search ranges use the search name in the preferred key order. They start with a narrow search and progressively widen. Applications normally process one search range at a time, returning to the user with the results, and then allow the user to choose whether or not to progress to the next, wider, search range.
  • Negative search ranges - these search ranges are all built for the same depth of search, but are built by permuting the words from the search name into different orders. It is normal to process all negative search ranges before returning to the user with the results or before any decision on the matches are taken.
  • Secondary name search ranges - are used to invoke secondary name lookup processing. They precede either a positive, negative or Customset Search-table.
  • Customset search ranges - allow user-defined search ranges and probes to be specified.
  • Probes - these search ranges are used in different situations to return small sets of records defining candidates which more closely match the search name. They generally precede either a positive or negative Search-table. Probes include Word probes, Word + Initial probes, Customset and Secondary Name probes.
A
NAMESET
Function is comprised of one or more
NAMESET
Function Keywords, initiated and terminated by an asterisk(*). The maximum length of the total Function specification is 1024 bytes. Here is an example of a valid Function,
*NEG,START=WI*
The WI* means Word + Initial range.
A
NAMESET
function which contains only the characters ** uses the following default values:
*FINE,CASCADE*
The
NAMESET
Function can also be defined in the Service Group definition as a
NAMESET
Function ’Definition’, and given a name. Refer to the Service Group Definition section of the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS
for more details. The
NAMESET
Function Definition ’name’ is passed as a parameter instead of the explicit Function keywords. For example, if the Service Group definition contains:
FUNCTIONS-DEFINITION NEGLGE:NEG,START=WI
Then, when calling the
NAMESET
Service this predefined function name,
NEGLGE
, can be used instead of the explicit keywords. When a Function definition name is used, it must be left justified in a 32 byte field and padded with spaces. Note that no * are used around the Function definition name. The Function parameter can contain a combination of both Function keywords and Function definitions, by use of the
BASE=
keyword. For example,
*BASE=NEGLGE,FULLSEARCH,NOKEYS*
would use both the Function definition keywords specified by
NEGLGE
as well as the keywords
FULLSEARCH
and
NOKEYS.
Function Keywords
NAMESET
Function keywords are described in the Service Group Definition section of the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS
.
NAME-IN (10-255 bytes)
This is the name for which keys or search ranges are to be built. In most cases it should be the full name, however, refer to the Algorithm Definition/Tips on Customizing an Algorithm section in the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS
, for a better description of the considerations in determining what is a ’name’.
The length of this parameter must be the same as defined in this service’s Algorithm Definition
NAME-LENGTH
parameter.
CLEANED-NAME (10-255 bytes)
This is the name after physical cleaning by the Cleaning and Character Set rules defined for this Service’s Algorithm. Refer to the
Nameset
section for more details on the Cleaning process.
CLEANEDNAME
must have the same length as
NAME-IN
.
WORDS-STACK (258 bytes (default))
The Words-stack contains an array of words extracted from the Cleaned Name. It is preceded by a two digit count of the number of words in the stack. The default number of entries in the array is 8. Each entry in the array contains a word, maximum length 24 bytes, and 8 bytes of extra information about that word. The total length of each array entry is therefore 32 bytes. The structure of each stack entry is defined in the following table.
Name
Offset
Size
Description
Word
0
24
The cleaned word.
Word type
24
2
The first byte of this two byte field contains the final word type. The second byte contains the original type before processing. See the table below for a list of Word-types.
Category
26
2
The last Edit-list Category processed for this word.
Original initial
28
1
The initial of the word before any Edit-list processing. For example, if the Edit-list contains a nickname rule to change BILL to WILLIAM, this field will contain a B and the Word will contain WILLIAM.
Word not Stabilized
29
1
A
Y
in this position indicates that the word is defined as a type
O
in the Edit-list. This means that word will not be Stabilized.
Filler
30
2
Not used
The following table contains a list of the valid Word-types for the
NAMESET
Service.
Word-type
Description
B
Suspect Code
– a word with one code character
– a word with one or more ambiguous characters
– a one or two character word preceded or followed by a code word
C
Code
– a single code character (initial)
– a word with 2 or more code characters
I
A single character (Initial)
M
A Major Word
N
A Major Code-word
S
A Skip Word
T
A Skip Code-word
Y
Any other word
blank
An unused entry in the Words-stack.
The number of entries to allow in the Words-stack is controlled by the
Algorithm Definition
parameter:
WORDS-STACK-SIZE=nn
The default value is 8 and thus the default size of this parameter is calculated as follows: (32 x 8) + 2 = 258
It is sometimes necessary to increase the number of entries in the Words-stack when dealing with long names and addresses, otherwise truncation will occur effecting the search & matching quality. The maximum number of entries is 99 and therefore the maximum size is: (32 x 99) + 2 = 3170
KEYS-STACK (142 bytes (default for 5-byte keys))
(202 bytes (default for 8-byte keys))
The Keys-stack contains an array of Name-keys generated from the input name. It is preceded by a two digit count of the number of keys in the stack.
A key-building application will store the keys from the Keys-stack into a database index.
The default key is 5 bytes long and contains the full range of binary values. An alternate key of length 8 bytes and containing only printable characters can be returned in case the database or application language cannot handle the binary keys (the hex value ’00’ is usually the problem). Refer to the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS, Algorithm Definition
Chapter for details on how to request 8 byte character keys.
The structure of a Keys-stack entry for 5-byte keys is as follows:
Name
Offset
Size
Description
key
0
5
The binary Name-key.
Key type
5
2
The Key Type
The structure of a Keys-stack entry for 8-byte keys is as follows:
Name
Offset
Size
Description
key
0
8
The character Name-key.
Key type
8
2
The Key Type
The Key-type defines the particular encoding mechanism used to build the key. This depends on the mix of common and uncommon words in the name. A common word is a word that exists in the Frequency Table. An uncommon word is one that does not exist in the Frequency Table. For a discussion on the Frequency Table, refer to the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS
. The following table provides a general description of the Key-type groups.
Key-type
Description
An
Key built from all uncommon words
Bn, Cn
Key built from a mixture of common and uncommon words where the major word is uncommon.
Dn, En, Fn, Gn
Key built from a mixture of common and uncommon words where the major word is common.
Hn
Key built from all common words.
The number of entries to allow in the Keys-stack is controlled by the Algorithm Definition parameter:
KEYS-STACK-SIZE=nn
The default value is 20 and thus the default size of this parameter for 5-byte keys is calculated as follows: (7 x 20) + 2 = 142
It is sometimes necessary to increase the number of entries in the Keys-stack when dealing with long names and addresses, otherwise truncation will occur effecting the search quality. The maximum number of entries is 99 and therefore the maximum size for 5-byte keys is: (7 x 99) + 2 = 695
SEARCH-TABLE
(677 bytes (default for 5-byte keys))
(806 bytes (default for 8-byte keys)
NAMESET
returns in the Search-table an array of key search ranges for the requested Search Strategy.
The Search Strategy was specified in the Function parameter. The Search-table contains either positive search ranges, negative search ranges or probes. Refer to the description of the Function parameter in section
NAMESET
Parameters for more information on these.
A search application will use the key ranges in the Search-table to search a database index which has been previously loaded with SSA-NAME3 Keys.
The Search-table is preceded by a single field, called the Preferred Key. For more information on the Preferred Key, refer to the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS
.
The Preferred Key is either 5-bytes or 8-bytes long depending on the key length specified in the Algorithm Definition (
SSA-NAME3-OPTIONS #23
).
Immediately following the Preferred Key are the Search-table entries. Each Search-table entry is 32 bytes long. When 8 byte keys are in effect, each entry is 38 bytes long. A Search-table entry is sometimes referred to as a ’depth of search’, a ’level of search’, a ’search range’ or a ’set’.
The contents of a Search-table entry are as follows:
Name
Description
From Key
The Key from which to start a search for this search range.
To Key
The Key from which to end a search for this search range.
From Key and To Key are used, for example, in an SQL Select statement of the form:
SELECT * FROM CUSTOMER-SSA-NAME3-TABLE WHERE SSA-NAME3-KEY >= SSA-NAME3-FROM-KEY AND SSA-NAME3-KEY <= SSA-NAME3-TO-KEY
Depth
The depth of search for this Search-table entry. This field is no longer used and is only present for upward compatibility from SSA-NAME3 versions prior to V1.6.
Scale
The estimated number of records that this search range will find.
The value is returned as two digits of the form ZN, where Z specifies the coarse size of the set (scaler ’10’ is 10, ’20’ is 100, ’30’ is 1000 etc.) and N selects a finer factor (multiplier) from this table:
Z N 0 1.00 1 1.26 2 1.58 3 2.00 4 2.51 5 3.16 6 3.98 7 5.01 8 6.31 9 7.94
To calculate the number of estimated records, use the following formula:
10 into power of Z x Factor(N)
For example, for a Scale of 23:
10 into power of 2 x 2.00 = 200
The Scale covers the range from 0 to about 8,000,000,000 records. It is based on the Key-type and the number of records in the file. It is only useful if the Algorithm contains a Frequency Table built from the names being searched, and the
NAMESET
FILESIZE=
parameter specifies a file size within 10 per cent of the actual size.
Scale is mainly useful to assist making the decision in the application, "is this search range too wide".
Contents
Two digits representing the number of words and initials used to build this search range. The first digit is the number of whole words and the second is the number of characters in the last word (if it was not whole).
For example, a Contents of ’30’ says this search range was built from three whole words in the name. A Contents of ’21’ says this search range was built from two whole words and the initial from the third word.
A special case is when the second digit contains a ’2’. This means that the initial represents only the uncommon words that begin with that initial, and not the common words.
The application can check this field for a value of ’00’, which marks the end of the Search-table.
Key Type
The encoding method used for the keys in this range. Refer to the description of Key-types in the
Keys-stack
parameter above.
Set Id
Sometimes referred to as Range-type, this identifies the type of a search range.
Possible values are:
ID TYPE OF RANGE GENERATED BY B Bad Empty name after Cleaning/Formatting C Cascade Default; FINE; COARSE; WORDS N Negative NEG P Customset CUSTOMSET= 2 Secondary SECONDARY, SECMINOR, SECMAJOR, SECALL W Word probe PROBESWORD, PROBESALL I Word/Initial probe PROBESINIT, PROBESALL S Code probe SSA-NAME3-OPTIONS #6 = C or Y
Sequence
The sets are numbered 00, 01, 02,. . .A break between set numbers occurs when two logically distinct sets of ranges are present. For example, a break occurs between probes and the positive cascade. A break will occur between the ranges generated for different pieces of a Compound- or Account-name.
Filler
Spare area, reserved for future use.
The structure of a Search-table entry for 5-byte and 8-bye keys:
Structure for 5-byte keys
Name
Offset
Size
From Key
0
5
To Key
5
5
Depth
10
2
Scale
12
2
Contents
14
2
Key Type
16
2
Set id
18
1
Sequence
19
2
Filler
21
11
Structure for 8-byte keys
Name
Offset
Size
From Key
0
8
To Key
8
8
Depth
16
2
Scale
18
2
Contents
20
2
Key-type
22
2
Set id
24
1
Sequence
25
2
Filler
27
11
The number of entries in the Search-table is controlled by the Algorithm Definition parameter:
SEARCH-TABLE-SIZE=nn
The default value is 21 and thus the default size of this parameter for 5-byte keys is calculated as follows: (32 x 21) + 5 = 677
The default size for 8-byte keys is calculated as: (38 x 21) + 8 = 806
It is sometimes necessary to increase the number of entries in the Search-table when dealing with long names and addresses and doing negative searches with probes, otherwise truncation will occur effecting the search quality. The maximum number of entries is 99.
CATEGORIES (20 bytes)
When
NAMESET
is processing a name to build keys or search ranges, each time an Edit-list rule is executed, its category name is added to the Categories list. For more information on Category names, refer to the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS, Edit List Definition
Chapter.
For example, Categories may contain:
PTPPPR
If the name contained a personal title (PT), a prefix word (PP) and a prefix replace word (PR).
WORK-AREA (30,000 bytes)
The Work-area is used by the Service as general purpose scratch-pad memory. For more information on the Work-area, refer to the
Work-area
section.

0 COMMENTS

We’d like to hear from you!