Table of Contents

Search

  1. Preface
  2. Introduction
  3. The Design Issues
  4. Standard Population Choices
  5. Parsing, Standardization and Cleaning
  6. Customer Identification Systems
  7. Fraud and Intelligence Systems
  8. Marketing Systems
  9. Simple Search
  10. Summary

Application and Database Design Guide

Application and Database Design Guide

Summary

Summary

This section provides you a summary of the design aspects.

Fundamental Characteristics of Strong Name Search

The fields used for keys and matching should be raw data fields like "name" or "address" rather than a list of separated specific elements.
Original real world data should be input without preprocessing. Databases must retain this original data.
The search engine and matching algorithms should be able to search and match as well as the best human experts in the organization can.
SSA-NAME3’s Standard Populations contain rules to:
  • control an editing phase to recognize items that are case and punctuation dependent, such as certain common company name abbreviations, e.g.: s/a , c/o;
  • overcome character representation variation, such as casing, accents, delimiters, punctuation, etc.;
  • recognize and ignore "noise" words;
  • recognize and treat as identical the common synonyms, abbreviations, translations, nicknames, ethnic and anglicized forms of words;
  • overcome error and variation in unrecognized words using stabilization algorithms;
  • build multiple keys or signatures from the transformed and stabilized data.

Philosophy and Convictions about Name Search and Matching

There is no such thing as an invalid name or address. Search and Matching must be possible on data that can not be understood, parsed or scrubbed.
Systems must be designed to work with whatever data they can get, rather than the mythical data that the designers would like to have. Raw original real world data contains more identification data and quality than enhanced, scrubbed and parsed data.
Data enhancement and scrubbing should only be used for reporting purposes; not for search, matching or identification, because any failure or error during scrubbing or enhancement of the data will reduce the quality of all future search, matching and identification.
The maximum quality that the data can support should be achievable despite performance and cost.
Tools should not restrict the quality. The application designer must, however, be able to tune the balance between quality and performance for specific transaction types and purposes.
As it is true that business risk varies with transaction values, so it must be possible to vary the cost/performance ratio of name search transactions, to match the risk associated with the transaction.
The quality, uniformity and reliability of name and address data is declining with the era of electronic transactions, global business and personal data entry. While poor quality data may limit the value of data, all systems should be able to process and match data regardless of its poor quality.
All customer and marketing databases will contain a percentage of data that is from "foreign" marketplaces.
Tools must work well regardless of the country of origin and language of the data and our tools must insulate the applications system developer from the differences between country and language, when it comes to name and address search and matching.
Tools should not demand significant local knowledge or be dependent upon the maintenance of databases of current postal address information. The ongoing daily change in this data creates a continual burden and weakness in the users business system.
To get good response in name search you must denormalize and maintain a copy of the relevant name search and matching data in a file or table optimized solely for name search and matching.
This table will contain an entry per SSA-NAME3 key together with secondary identification data used to make the final choice. To optimize access to this table or file it will be physically ordered on the SSA-NAME3 key which will not naturally be unique.

0 COMMENTS

We’d like to hear from you!