Because there is an extreme skew in the distribution of words used in people’s names, company names and addresses, some names will cover many candidate records, while other names will have only a few candidates.
If SMITH represented 1% of the population and Lebedinsky .001%, then:
Population Size
Number of SMITHs
Number of LEBEDINSKYs
1,000
10
1
100,000
1,000
1
1,000,000
10,000
10
If the family name alone was used in the search, a search for SMITH in a 100,000 record file would be slow; in million record file, prohibitive.
The more data that is given to the search, the better performance it can potentially achieve. However, even when more data is supplied in the search, coping with the skew of common and uncommon names requires careful key design. SSA-NAME3’s key-building algorithms use a proprietary approach that gives the best balance between reliability and performance.