Philosophy and Convictions about Name Search and Matching
There is no such thing as an invalid name or address. Search and Matching must be possible on data that can not be understood, parsed or scrubbed.
Systems must be designed to work with whatever data they can get, rather than the mythical data that the designers would like to have. Raw original real world data contains more identification data and quality than enhanced, scrubbed and parsed data.
Data enhancement and scrubbing should only be used for reporting purposes; not for search, matching or identification, because any failure or error during scrubbing or enhancement of the data will reduce the quality of all future search, matching and identification.
The maximum quality that the data can support should be achievable despite performance and cost.
Tools should not restrict the quality. The application designer must, however, be able to tune the balance between quality and performance for specific transaction types and purposes.
As it is true that business risk varies with transaction values, so it must be possible to vary the cost/performance ratio of name search transactions, to match the risk associated with the transaction.
The quality, uniformity and reliability of name and address data is declining with the era of electronic transactions, global business and personal data entry. While poor quality data may limit the value of data, all systems should be able to process and match data regardless of its poor quality.
All customer and marketing databases will contain a percentage of data that is from "foreign" marketplaces.
Tools must work well regardless of the country of origin and language of the data and our tools must insulate the applications system developer from the differences between country and language, when it comes to name and address search and matching.
Tools should not demand significant local knowledge or be dependent upon the maintenance of databases of current postal address information. The ongoing daily change in this data creates a continual burden and weakness in the users business system.
To get good response in name search you must denormalize and maintain a copy of the relevant name search and matching data in a file or table optimized solely for name search and matching.
This table will contain an entry per SSA-NAME3 key together with secondary identification data used to make the final choice. To optimize access to this table or file it will be physically ordered on the SSA-NAME3 key which will not naturally be unique.