Service Group Application Reference

Service Group Application Reference

Operation

Operation

Formatting processing is done in three major phases.
  1. Phrase Editing
  2. Edit-list processing
  3. Post processing
Some of the functions within Formatting are user-controlled via settings in the Algorithm parameter,
FORMATTING-OPTIONS
, and by the Edit-list rules. For more information on these, see the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS
.

Phrase Editing

The name is checked for the presence of phrases (which are defined in the Edit-list), and if any are found the replacement is performed accordingly.
Phrases are processed as follows. The name is broken into words (left-to-right, using BLANK boundaries) and each word is appended to an internal temporary phrase. After each word is added, the temporary phrase is checked to see if it ends in an Edit List Phrase entry; if so then the tail (phrase) part is replaced with the Phrase replacement and the whole phrase is checked again immediately.
When the Edit List Phrase entries are inspected, entries are processed from the longer to the shorter.
For more about Phrases, see the Edit List Definition chapter in the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS
.

Edit-list Processing

The name is broken into words (’tokens’) and each token is looked-up in the Edit-list. The Edit-list rules are examined in the following order:
Phase
Cat
Description
1
Service name
8/32
1
M
Mark
2
C
Prefix join
D
Delete
G
Major right, delete
H
Major right, keep
P
Postfix join
R
Replace
S
Skip
X
Major left, delete
Y
Major left, keep
3
B
Prefix delete
F
Prefix split
J
Prefix replace
4
A
Postfix split
E
Postfix delete
K
Postfix replace
5
N
Nicknames with diminutives
If a token is found to have a rule in the Edit-list then that rule is applied and the result moved to the Word-stack, otherwise the token is moved straight to the Words-stack.
If an Edit-list rule results in the token being split, each part of the token is then looked-up in the Edit-list again.
If the same token is found more than once in the same phase, only the first rule in that phase is processed. If the same token is found in multiple phases, each rule is processed.
Each token is passed to the Formatting User Exit which can optionally handle special nick-name endings or special street name words. For details on the operation of the supplied English Formatting User Exit, see
Nickname Processing
section.
When an Edit-list rule is applied, the Edit-list Category name associated with the rule is added to the Categories list and the last Category applied to that token added to the
NAMESET
Words-stack.
Note that Cleaning Editing and Major Marker Processing Edit-list rules do not get invoked when Calling the Formatting Service directly, only when it is Called via NAMESET.

Edit Rule Loops

The above description shows that it is possible for Formatting to get into a loop. The simple problem of replacing one word with another and then replacing it again by the original word is an obvious error but some of the more subtle cases are impossible to avoid.
For example, if the word NEW was specified as Prefix-split and TOWN as Postfix-concatenate, the word NEWTOWN would first be split into NEW and TOWN (this is the rule for NEW) and then joined again into NEWTOWN (this is the rule for TOWN). Each of the rules makes sense on its own but when the two meet the behavior is undesirable. The formatting routine guards against such situations and will abort the loop to complete the formatting process.
When a loop is detected a response is returned. Although it is always the Formatting that detects the loop it may have been called by another Service. Therefore the Primary response code will vary according to the Service being called, as follows:
  • Formatting 02nn38
  • Cleaning 020042
  • NAME3 070034
  • NAMESET 070046
The Secondary response code will always be the Formatting code, i.e.
02nn38
.
If you get too many such response codes you should produce a report of names that cause it and then check to see if the situation can be rectified by modifying the Edit-list Definition file (this should be done carefully because it may invalidate the stored name keys in your database). See the Response Codes chapter for more information about response codes.

Post Processing

In this step, final adjustments are applied to the Words-stack. If this step ends without an error then a post compress is done (empty entries are removed). The words in the Words-stack are marked with a Word-type character as follows,
<space> EMPTY S SKIP T SKIPCODE I INITIAL Y SELECT C CODE M MAJOR N MAJCODE B SUSPECT D DELETED (used only by the TRACE service)
If the Word-stack is empty then end the Service with error response code 04.
Do post clean (every entry gets a final cleaning)
Do post compress (empty entries are removed)
Do code processing (entries are examined to see if they are words or codes and special rules are applied to codes).
Do post compress
Pick the MAJOR word if one is present. Each word in the stack is checked in the order of the nameformat:
If the word is a MAJOR then
If we already have a major then convert the word to SELECT, else pick the word as the major.
If the word is a MAJOR-CODE then
If we already have a major then convert the word to CODE, else pick the word as the major.
If a major was found then end the post-processing.
Pick one of the SELECT words. Each word in the stack is checked in the order of name-format:
If the word is a SELECT word then it is converted to MAJOR and the post-processing ends.
Pick one of the CODE words. Each word in the stack is checked in the order of name-format:
If the word is a CODE then convert it to MAJOR and end the post-processing.
Concatenate initials. Concatenate all the initials in the stack. If there were any then convert the result (which could still be one initial) to a MAJOR and end the post-processing. After concatenating, the generated word may be put anywhere in the stack; it is identified by the MAJOR type it has.
Pick one of the SKIP words. Each word in the stack is checked in the order of the name-format:
If the word is SKIP then convert it to MAJOR and end the post-processing.
If
Formatting option #7
is active and a street word was found during the Edit-list step then do street processing and end this step.
If we got here then we have no major, set response code to 04 and end the post-processing.

Post Clean

This stage cleans the stack words, one at a time, using a Character-set table.
If the entry is EMPTY then leave it alone.
Each position in the word is now examined:
If it is a BLANK, and the "break on blank" option (Formatting option #6) was selected then clear the rest of word.
If the replacement value for this position in Table 8 is BLANK then delete this character from the word.
Otherwise replace this position with the value in Table 8.

Post Compress

This process removes empty entries from the stack. The stack is thus compressed, all empty entries are now at the end and the words count reflects the number of non-empty entries.

Code Processing

This process handles words which are identified as ’codes’, that is words that represent values which are not a word of the language or a name. Table 2 of the character-set module is used to identify code-characters. This table categorizes each character as a code (usually the numeric characters), an ambiguous or suspect code (a character that may be a code for example, you may wish to have the letter O defined as an ambiguous code-character), or a letter.
First we want to identify codes and suspect codes in the stack. The words are processed in the order of the stack:
If the entry is EMPTY then leave it as is (nothing to do). If the word is marked as SKIP then leave alone.
Count the number of non blank characters into L Count the number of code-characters into C Count the number of ambiguous characters into A Add A to C (number of non-alpha) If C == 0 then Leave this entry as is If C >= 2 then mark this entry as CODE (2 or more digits in word) Now we know C == 1, that is ONE non-alpha was found. If L == 1 then mark word as CODE (word is one non-alpha alone) Else if A > 0 then leave the entry alone (non-initial contains one ambiguous character) Else mark the word as SUSPECT (non initial contains one code-character) Now we try to identify further suspects codes. Each entry is processed in turn. If entry is CODE or SUSPECT or EMPTY leave it alone. If entry is a short word (1 or 2 letters) and it is preceded or followed by a CODE mark it as SUSPECT Else leave the entry as is Next we concatenate adjacent codes, (if Formatting option #3 is active)
A run of
CODE
and
SUSPECT
words are concatenated into one word which is marked as
CODE
. An EMPTY entry does not break a run. Now process
CODE
and
SUSPECT
words according to the defined options. Each entry is processed one at a time.
If the word is
SUSPECT
then convert it according to the "suspect codes" option (Formatting option #2), Cmark as
CODE W
mark as
SELECT
Smark as
SKIP
Mmark as
MAJOR
Ddelete word elsemark as
SELECT
If the word is
CODE
then convert it according to the "codes" option (
Formatting option #1
), Cmark as
CODE W
mark as
SELECT S
mark as
SKIP M
mark as
MAJOR-CODE D
delete word elsemark as
CODE
Else leave the entry as is.

0 COMMENTS

We’d like to hear from you!