Service Group Application Reference

10.2
- 10.5 HotFix 2
- 10.5 HotFix 1
- 10.5
- 10.2 HotFix 1
- 10.1
- 10.0 HotFix 1
- 10.0

Back Next

Operation

Formatting processing is done in three major phases.

Phrase Editing

Edit-list processing

Post processing

Some of the functions within Formatting are user-controlled via settings in the Algorithm parameter,

FORMATTING-OPTIONS

, and by the Edit-list rules. For more information on these, see the

DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS

Phrase Editing

The name is checked for the presence of phrases (which are defined in the Edit-list), and if any are found the replacement is performed accordingly.

Phrases are processed as follows. The name is broken into words (left-to-right, using BLANK boundaries) and each word is appended to an internal temporary phrase. After each word is added, the temporary phrase is checked to see if it ends in an Edit List Phrase entry; if so then the tail (phrase) part is replaced with the Phrase replacement and the whole phrase is checked again immediately.

When the Edit List Phrase entries are inspected, entries are processed from the longer to the shorter.

For more about Phrases, see the Edit List Definition chapter in the

DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS

Edit-list Processing

The name is broken into words (’tokens’) and each token is looked-up in the Edit-list. The Edit-list rules are examined in the following order:

Phase	Cat	Description
1	Service name	8/32
1	M	Mark
2	C	Prefix join
	D	Delete
	G	Major right, delete
	H	Major right, keep
	P	Postfix join
	R	Replace
	S	Skip
	X	Major left, delete
	Y	Major left, keep
3	B	Prefix delete
	F	Prefix split
	J	Prefix replace
4	A	Postfix split
	E	Postfix delete
	K	Postfix replace
5	N	Nicknames with diminutives

If a token is found to have a rule in the Edit-list then that rule is applied and the result moved to the Word-stack, otherwise the token is moved straight to the Words-stack.

If an Edit-list rule results in the token being split, each part of the token is then looked-up in the Edit-list again.

If the same token is found more than once in the same phase, only the first rule in that phase is processed. If the same token is found in multiple phases, each rule is processed.

Each token is passed to the Formatting User Exit which can optionally handle special nick-name endings or special street name words. For details on the operation of the supplied English Formatting User Exit, see

Nickname Processing

section.

When an Edit-list rule is applied, the Edit-list Category name associated with the rule is added to the Categories list and the last Category applied to that token added to the

NAMESET

Words-stack.

Note that Cleaning Editing and Major Marker Processing Edit-list rules do not get invoked when Calling the Formatting Service directly, only when it is Called via NAMESET.

Edit Rule Loops

The above description shows that it is possible for Formatting to get into a loop. The simple problem of replacing one word with another and then replacing it again by the original word is an obvious error but some of the more subtle cases are impossible to avoid.

For example, if the word NEW was specified as Prefix-split and TOWN as Postfix-concatenate, the word NEWTOWN would first be split into NEW and TOWN (this is the rule for NEW) and then joined again into NEWTOWN (this is the rule for TOWN). Each of the rules makes sense on its own but when the two meet the behavior is undesirable. The formatting routine guards against such situations and will abort the loop to complete the formatting process.

When a loop is detected a response is returned. Although it is always the Formatting that detects the loop it may have been called by another Service. Therefore the Primary response code will vary according to the Service being called, as follows:

Formatting 02nn38

Cleaning 020042

NAME3 070034

NAMESET 070046

The Secondary response code will always be the Formatting code, i.e.

02nn38

If you get too many such response codes you should produce a report of names that cause it and then check to see if the situation can be rectified by modifying the Edit-list Definition file (this should be done carefully because it may invalidate the stored name keys in your database). See the Response Codes chapter for more information about response codes.

Post Processing

In this step, final adjustments are applied to the Words-stack. If this step ends without an error then a post compress is done (empty entries are removed). The words in the Words-stack are marked with a Word-type character as follows,


<space> EMPTY
S       SKIP
T       SKIPCODE
I       INITIAL
Y       SELECT
C       CODE
M       MAJOR
N       MAJCODE
B       SUSPECT
D       DELETED (used only by the TRACE service)

If the Word-stack is empty then end the Service with error response code 04.

Do post clean (every entry gets a final cleaning)

Do post compress (empty entries are removed)

Do code processing (entries are examined to see if they are words or codes and special rules are applied to codes).

Do post compress

Pick the MAJOR word if one is present. Each word in the stack is checked in the order of the nameformat:

If the word is a MAJOR then

If we already have a major then convert the word to SELECT, else pick the word as the major.

If the word is a MAJOR-CODE then

If we already have a major then convert the word to CODE, else pick the word as the major.

If a major was found then end the post-processing.

Pick one of the SELECT words. Each word in the stack is checked in the order of name-format:

If the word is a SELECT word then it is converted to MAJOR and the post-processing ends.

Pick one of the CODE words. Each word in the stack is checked in the order of name-format:

If the word is a CODE then convert it to MAJOR and end the post-processing.

Concatenate initials. Concatenate all the initials in the stack. If there were any then convert the result (which could still be one initial) to a MAJOR and end the post-processing. After concatenating, the generated word may be put anywhere in the stack; it is identified by the MAJOR type it has.

Pick one of the SKIP words. Each word in the stack is checked in the order of the name-format:

If the word is SKIP then convert it to MAJOR and end the post-processing.

Formatting option #7

is active and a street word was found during the Edit-list step then do street processing and end this step.

If we got here then we have no major, set response code to 04 and end the post-processing.

Post Clean

This stage cleans the stack words, one at a time, using a Character-set table.

If the entry is EMPTY then leave it alone.

Each position in the word is now examined:

If it is a BLANK, and the "break on blank" option (Formatting option #6) was selected then clear the rest of word.

If the replacement value for this position in Table 8 is BLANK then delete this character from the word.

Otherwise replace this position with the value in Table 8.

Post Compress

This process removes empty entries from the stack. The stack is thus compressed, all empty entries are now at the end and the words count reflects the number of non-empty entries.

Code Processing

This process handles words which are identified as ’codes’, that is words that represent values which are not a word of the language or a name. Table 2 of the character-set module is used to identify code-characters. This table categorizes each character as a code (usually the numeric characters), an ambiguous or suspect code (a character that may be a code for example, you may wish to have the letter O defined as an ambiguous code-character), or a letter.

First we want to identify codes and suspect codes in the stack. The words are processed in the order of the stack:

If the entry is EMPTY then leave it as is (nothing to do). If the word is marked as SKIP then leave alone.

Count the number of non blank characters into L Count the number of code-characters into C Count the number of ambiguous characters into A Add A to C (number of non-alpha) If C == 0 then Leave this entry as is If C >= 2 then mark this entry as CODE (2 or more digits in word) Now we know C == 1, that is ONE non-alpha was found. If L == 1 then mark word as CODE (word is one non-alpha alone) Else if A > 0 then leave the entry alone (non-initial contains one ambiguous character) Else mark the word as SUSPECT (non initial contains one code-character) Now we try to identify further suspects codes. Each entry is processed in turn. If entry is CODE or SUSPECT or EMPTY leave it alone. If entry is a short word (1 or 2 letters) and it is preceded or followed by a CODE mark it as SUSPECT Else leave the entry as is Next we concatenate adjacent codes, (if Formatting option #3 is active)

A run of

CODE

and

SUSPECT

words are concatenated into one word which is marked as

CODE

. An EMPTY entry does not break a run. Now process

CODE

and

SUSPECT

words according to the defined options. Each entry is processed one at a time.

If the word is

SUSPECT

then convert it according to the "suspect codes" option (Formatting option #2), Cmark as

CODE W

mark as

SELECT

Smark as

SKIP

Mmark as

MAJOR

Ddelete word elsemark as

SELECT

If the word is

CODE

then convert it according to the "codes" option (

Formatting option #1

), Cmark as

CODE W

mark as

SELECT S

mark as

SKIP M

mark as

MAJOR-CODE D

delete word elsemark as

CODE

Else leave the entry as is.

Formatting

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal

0 COMMENTS

We’d like to hear from you! Log in to comment.

Rename Saved Search

Table of Contents

Service Group Application Reference

Service Group Application Reference

Operation

Operation

Phrase Editing

Edit-list Processing

Edit Rule Loops

Post Processing

Post Clean

Post Compress

Code Processing