Service Group Application Reference

Service Group Application Reference

Operation

Operation

The Cleaning Service will call the Cleaning Routine defined in the Algorithm Definition. For example,
CLEANING=N3CN
will cause the supplied single-byte Cleaning routine,
N3CN
, to be called.
CLEANING=N3CNDB
will cause the supplied double-byte Cleaning routine,
N3CNDB
, to be called.
Both Cleaning routines operate on the name from left to right.

N3CN Operation

This Cleaning routine performs the initial processing of a user supplied name. The main intention is the removal of unwanted characters and the replacement of character variations with a single form, known as de-shaping, (for example, lower-case replaced with the upper-case form).
Cleaning is performed in several phases, these being,
  • Early cleaning
  • Major Marker Processing
  • Cleaning Editing
  • Final Cleaning
These phases are described in the following sections.

Early Cleaning

The Early Cleaning process uses a Character Set table to translate the incoming name before any other process. This is used to remove or translate some characters that might interfere with the later Cleaning processes.
The default settings for the Character Set table which drives this phase is to leave all display characters unchanged.

Major Marker Processing

Following the early cleaning there is a Major Marker processing phase. This process is driven by Editlist rules which will identify special Major word markers. As a result of a Marker identification, the name is sometimes reordered. A Major word is, as the name suggests, that word in the name considered to be the most important (e.g. the surname of a person’s name; the street name of an address). For more information on Major words, refer to the NAMESET/Tips on Choosing a Search Strategy section in the
APPLICATION REFERENCE
guide.
Major markers can be one of the following types:
Head of Name Marker
This marker designates the full name from the beginning up to the marker as the Major. The comma character is an example of this type. For example, the name:
SMITH, JOHN WILLIAM CASEY JONES, HENRY
are reordered to:
JOHN WILLIAM SMITH HENRY CASEY JONES
Tail of Name Marker
This marker designates the full name from the marker to the end of the name as the Major. This marker causes no name reordering. For example, the % character in the names:
JOHN WILLIAM %SMITH HENRY %CASEY JONES
Left Marker
This marker designates the word on the left of the marker as the Major.
Right Marker
This marker designates the word on the right of the marker as the Major.
Delimited Marker
This marker designates the part of the name from the marker until the matching closing marker as the Major. This marker designates two characters (as the leading and trailing delimiter). For example, the () characters in the name:
JOHN WILLIAM (SMITH)
These markers are user defined in the Edit-list. Refer to the Edit List chapter of the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS
.
If the name contains more than one Major Marker then only the first one is processed. If a marker designates an empty string then it is ignored (and removed) from the name.

Delete Marker Processing

In the same phase as Major Marker processing is taking place, so also is Delete Marker processing. This process is also driven by Edit-list rules which will identify special Delete word markers.
Delete Marker rules are case sensitive.
Delete Markers can be one of the following types:
’Delete Between’ Markers
Delete Between Markers allows all of the text between two markers to be deleted. For example, if
()
were defined as ’Delete Between’ markers, the name:
ALPHA PROCESSING CO (DISTRIBUTION)
would become;
ALPHA PROCESSING CO
’Delete Before/After’ Markers
Delete Before/After Markers allows all of the text before or after a marker to be deleted and optionally replaced with another word or phrase. For example, if ’SEE DOC’ was defined as a ’Delete After’ marker, the name:
ALPHA PROCESSING CO SEE DOC NO 36541
would become;
ALPHA PROCESSING CO
Cleaning Editing
Cleaning editing is the next phase, again driven by Edit-list rules. This phase allows simple userdefined string/character replacements to be put into effect before the normal cleaning rules are processed.
Cleaning Editing rules are case sensitive.
For example, with no cleaning editing, and using the tables as defined in the Fast-start, the string,
ME T/A YOU
cleans to
ME T A YOU
because the / character is defined as a delimiter and is removed. Because cleaning editing is invoked before the normal cleaning rules it can be used to trap such events. For example adding the following rule to the Edit-list definition file,
*S >T/A< *W >TRADING AS<
will trap the
T/A
and replace it with
TRADING AS
.
That "
t/a
" (i.e. lower case) would not be converted unless the Edit-list contained that rule in lower case.
Cleaning Editing processes the rules in order from longest to shortest. For example, BV and BVBA are two company legal endings in the Netherlands (similar to
INC.
in the USA). Defining the following Cleaning Editing rules in the Edit-list:
*S >B.V.<BA *W >BV< *S >B.V.B.A.<BA *W >BVBA<
means that the name, WORLDGROUP HOLDINGS B.V.B.A. would be correctly translated to WORLDGROUP HOLDINGS BVBA. If Cleaning Editing processed rules in order from shortest to longest, it would have become WORLDGROUP HOLDINGS BV B A (which is not what was required) because the B.V. rule would have been processed first and the remaining characters not recognized by the other rule.
Final Cleaning
This logic is driven by a Character set table which classifies each character as one of the following,
Quote
Quote characters are removed from the input name. If a quote is embedded in a word then the word is not broken (e.g. O’HARA is cleaned into OHARA). The quote (’) and double-quote (") are examples of this type.
Comma
Comma characters are removed from the input name. If the name format is ’R’ (i.e. the major name is at the right of the name and the Algorithm definition has
NAME-FORMAT=R
) and if the name contains a comma then the part of the name prior to the comma is considered to be the last name. It is then moved to the end of the name (e.g. SMITH, JOHN is cleaned into JOHN SMITH). If the comma has no words before it then it is ignored. After the first comma is processed then all other commas are treated as delimiters. The comma (,) is of this type but some systems may treat the slash (/) character for names like SMITH/JOHN B. This comma processing can be turned off by setting
CLEANING-OPTIONS #2
in the Algorithm Definition.
Delimiter
Delimiters are removed from the input name. If a delimiter is embedded in a word then the word is broken at that position as if there was a blank (e.g. VAN-DAM is cleaned into VAN DAM). Most special symbols are delimiters, as well as the blank character.
Token
The Token is a special character which causes cleaning to add a blank both before and after the Token character. e.g. If the character " is defined as a Token, and the name "HELLO" was to be cleaned, the output name would be <blank>"<blank>HELLO<blank"<blank>
Self
The Self type is not converted. It is a shorthand notation to express the fact the character is to be kept as itself.
Other
All other characters are replaced with the value in the table (e.g. ’e’ is replaced with ’E’). Alphabetic characters and the numerals are of this type, as well as accented characters.
Any character can be defined as a delimiter, quote or comma, in which case it will be removed and treated accordingly. Characters of type ’other’ can be replaced with any character that you want (mostly with themselves or with their upper-case version).
The cleaned name is padded with blanks up to the name length as defined with the
NAME-LENGTH=
directive in the Service Group definition file.
In some implementations, the comma processing may leave the last name right justified while the given names are left justified.

0 COMMENTS

We’d like to hear from you!