Identity Resolution
- Identity Resolution 10.2 HotFix 1
- All Products
This output is more fully described below.INPUT: mr jim robert gray the 1st OUTPUT: mr jim robert gray the 1st STACK: 07 000 000 05 1 MR D D 000 001 PT 2 JIM D N 003 005 NK 3 JAMES Y Y C 10 C 03 003 005 4 ROBERT Y Y U 00 C 17 007 012 5 GRAY M Y C 00 U 00 014 017 6 THE D D 019 021 NW 7 1ST C B U 00 U 00 023 025
INPUT: | mr jim robert gray the 1st |
OUTPUT: | mr jim robert gray the 1st Although the TRACE Service performs some cleaning on the input name it is only the "Early Cleaning" that is executed (characters replaced according to Character-set Table 14). Therefore, in this example no upper-casing was performed. |
STACK: | 0 7 000 000 05 This is the Word-stack header. 07 - A count indicating the number of words in the following Words-stack. 000 000 - Location of major marker or markers in original name. If you have Major markers defined in your Edit-list and they occurred in the name these values will tell you where they were in the pre-cleaned name. If the markers were of type left, right, head or tail there will only be one offset. With marker type ’delimiter’ the first value is the offset of the opening marker and the second that of the closing marker. 05 - Index of major word. This indicates which entry in the Words-stack is that of the Major word, as selected by the Formatting. |
1 MR | D D 000 001 PT - This is a typical personal title, fields that are of interest are as follows, D - The Word-type, as decided by the Formatting after applying any Edit-list rules. In this case the D indicates that the word would have been deleted during the normal course of the Formatting. D - TheWord-type, as decided by the Formatting before applying any Edit rules. In this example, the D comes from the following Category definition in the Edit-list:
000 001 - These two numbers are the offsets, within the name, of the first and last characters in this word. Position 1 is at offset 0. PT - The category name used to define this word in the Edit-list. |
2 JIM | D N 003 005 NK - This is similar to the previous line except that the word is defined as a nickname in the Edit-list. In this example the Edit-list probably had a rule like,
which, with normal Formatting, would cause the JIM to be replaced with JAMES. TRACE also does the replacement but keeps the original word and marks it as a deleted word. |
3 JAMES | Y Y C 10 C 03 003 005 - Here we have our first real word to survive the Formatting. Y - The Word-type, as decided by Formatting after applying any Edit rules. Y - The Word-type, as decided by the Formatting before applying Formatting rules. As this word was neither deleted nor had its Word-type changed by Formatting this is simply a duplicate of the first Y. C - Common or Uncommon Major word. If this word is a major word, then a C in this column indicates that the word was a common major word, a U means uncommon major word. 10 - Scale as a Major word. In this case a scale of 10 indicates that the word JAMES had a count of less than 10 in the Major word table. Note that this seemingly obvious translation of a 10 scale to a 10 count is misleading, this is a logarithmic scale that happens have a 1:1 ratio with the value 10. For more information on how the scale is calculated read the NAMESET/ Parameters section earlier in this manual. C - Common or Uncommon Minor word. If this word is a minor word, then a C in this column indicates that the word was a common minor word, a U means uncommon minor word. 03 - Scale as a Minor word. The word JAMES occurred less than 2 times in the common Minor word table. 003 005 - Starting and ending position of word. |
4 ROBERT | Y Y U 00 C 17 007 012 - A normal word, flagged as being a minor word (Y ), uncommon Major (U ) and common Minor (C ). |
5 GRAY | M Y C 00 U 00 014 017 - The Major word. Before Formatting rules were applied it was flagged as a Minor or possible Major word (Y ). However, after the Formatting rules the word was identified as a Major word (M ). |
6 THE | D D 019 021 NW - A word defined as a noise word in the Edit-list. |
7 1ST | C B U 00 U 00 023 025 - A Suspect Code-word (B ) was determined to be a Codeword (C ) after the Formatting rules were applied. |