Table of Contents

Search

  1. Preface
  2. Introduction
  3. The Design Issues
  4. Standard Population Choices
  5. Parsing, Standardization and Cleaning
  6. Customer Identification Systems
  7. Fraud and Intelligence Systems
  8. Marketing Systems
  9. Simple Search
  10. Summary

Application and Database Design Guide

Application and Database Design Guide

The Telephone Book as Metaphor for Name Search Index Design

The Telephone Book as Metaphor for Name Search Index Design

In the telephone book, a search for the name Ann Jackson Smith would normally succeed, on the "Smith A" page.
Page 321
SMITH A
Smith A J
10 Main St Springvale
9257 5496
When the name being searched for is A J Smith or Ann Jackson Smith, the entry is found relatively easily by browsing through all of the Smith A J entries.
A search for A Smith or Ann Smith is slower because more names must be browsed. If the full name had been indexed, the search for Ann Smith would be faster and the search for Ann Jackson Smith even quicker.
Page 327
SMITH Alan
Smith Ann Jackson
10 Main St Springvale
9257 5496
Though this increases the size of each entry and the cost of capturing the information, the overall performance of searches is improved when there is more data in the name. Given a full name to search with, its entry can be found more quickly.
In addition, when the name being searched for has missing or extra words or words in a different order, the simple telephone book indexing system starts to break down.
Searches for Ann Jackson-Smith, Ann Smith Jackson or Smith Ann will fail unless the searcher, after failing on the "J" and "A" pages, permutes the words and looks on the "S" page.
Regardless, a search for Ann Jackson will never succeed if the entry in the book was Smith, J.A. or Smith, Ann Jackson.
If, however, the name Ann Jackson Smith was indexed on three pages of the telephone book, on an "Ann", "Jackson" and "Smith" page, by permuting the order of the words, then any of the above searches would succeed by opening one page.
Page 17
ANN Smith B
Ann, Smith Jackson
10 Main St Springvale
9257 5496
Page 119
JACKSON Ann K
Jackson, Ann Smith
10 Main St Springvale
9257 5496
Page 327
SMITH Alan
Smith Ann Jackson
10 Main St Springvale
9257 5496
The size of the telephone book increases, but search cost does not. The extra "index entries" increases the physical size, yet improves overall quality and performance because any search succeeds.
In computer databases, with today’s low data storage costs, regardless of the volume of the file, the right solution for name indexes is permutation of words in the index entries at update time. And storing multiple records on separate "pages" in the database just like our example in the telephone book above. Permutation of naming words at search time alone can not guarantee to overcome the missing word, extra word or gross single word errors. This is not a design problem that can be overcome with better design, it is a mathematical constraint.

0 COMMENTS

We’d like to hear from you!