Table of Contents

Search

  1. Preface
  2. Introduction
  3. Definition File Overview
  4. Customization Steps
  5. Service Group Definition
  6. Algorithm Definition
  7. Edit-list Definition
  8. Matching Scheme Definition

Service Group Definition and Customization Guide

Service Group Definition and Customization Guide

Edit-list Processing

Edit-list Processing

During the Cleaning and Formatting phases of key building and matching, the components of a name are checked against entries in the Edit-list. If an exact string match is found, the associated rule is run.
A word might belong to multiple Edit-list sections if the words are processed in different Edit-list processing phases.
The sequence of Edit-list processing is as follows.
Phase
Description
1
Multi-Valued Field Processing
2
Character rule processing of Major and Delete Markers
3
Character rule processing of Cleaning Editing Definitions
4
Phrase Processing
5
Edit Word Processing
Edit Word Processing Phase (5) Type
Category Type
Description
5.1
M
Mark
5.2
C
Prefix join
D
Delete
G
Major right, delete
H
Major right, keep
O
Word not Stabilized
P
Postfix join
R
Replace
S
Skip
X
Major left, delete
Y
Major left, keep
5.3
B
Prefix delete
F
Prefix split
J
Prefix replace
5.4
A
Postfix split
E
Postfix delete
K
Postfix replace
5.5
N
Nicknames with diminutives
5.6
I
Secondary (Search and Match Only)

Reprocessing of Split Words

The Edit-list processing will also operate separately on any defined split words, as follows.
Prefix Split - If SAINT has been defined as a prefix split word and is found in the name string (e.g. SAINTJOHN) a split is made and both words (SAINT and JOHN) are processed like they were separated in the name string.
Postfix Split - If ROAD has been defined as a postfix split word and is found in the name string (e.g. CROSSROAD) a split is made and both words (CROSS and ROAD) are processed like they were separated in the name string.

Edit Rule Loops

It is possible to cause an uncontrolled application of edit rules, sometimes this is unavoidable. The simple problem of replacing one word with another and then replacing it again by the original word is an obvious error but some of the more subtle cases are impossible to avoid.
For example, with the Fast-start Edit-list the word NEWTOWN will cause such a problem: the word NEW is specified as Prefix-split and TOWN is defined as a Postfix-concatenate. This way NEWTOWN is first split into NEW and TOWN (this is the rule for NEW) and then joined again into NEWTOWN (this is the rule for TOWN). Each of the rules makes sense on its own but when the two meet the behavior is undesirable.
The formatting routine guards against such situations and will abort the loop to complete the formatting process.

0 COMMENTS

We’d like to hear from you!