Table of Contents

Search

  1. Preface
  2. Introduction
  3. The Design Issues
  4. Standard Population Choices
  5. Parsing, Standardization and Cleaning
  6. Customer Identification Systems
  7. Fraud and Intelligence Systems
  8. Marketing Systems
  9. Simple Search
  10. Summary

Application and Database Design Guide

Application and Database Design Guide

Effect of File Size on Name Search Performance

Effect of File Size on Name Search Performance

Because there is an extreme skew in the distribution of words used in people’s names, company names and addresses, some names will cover many candidate records, while other names will have only a few candidates.
If SMITH represented 1% of the population and Lebedinsky .001%, then:
Population Size
Number of SMITHs
Number of LEBEDINSKYs
1,000
10
1
100,000
1,000
1
1,000,000
10,000
10
If the family name alone was used in the search, a search for SMITH in a 100,000 record file would be slow; in million record file, prohibitive.
The more data that is given to the search, the better performance it can potentially achieve. However, even when more data is supplied in the search, coping with the skew of common and uncommon names requires careful key design. SSA-NAME3’s key-building algorithms use a proprietary approach that gives the best balance between reliability and performance.

0 COMMENTS

We’d like to hear from you!