Whether the process is an online inquiry like Customer Identification, or a Criminal Records search, or a batch matching process like merging Marketing Lists before a selection for mailing, we must find all the candidates that could possibly be the same as each other, or are the same as our "search data".
We must mimic a human expert in finding all the candidate records, and then make the same matching choices as the human expert would make for that specific business purpose.
This means that our searching and matching technology must overcome the natural error and variation that unavoidably occurs in all real world identification data. We must do this despite the fact that the process of capturing the real world data into computer systems actually introduces even more error and variation.
In many systems the objective is also to overcome fraudulent modification of identity data. This "class of error" is more aggressive in that it does not occur naturally, but is introduced to defeat or control aspects of matching systems while retaining the defense that it was in error rather than fraudulent.
Any attempt to overcome error and variation increases the work done and therefore the cost. We will also see that, in order to compensate for more error, we always run the risk of introducing false matches.
The task is a balancing or tuning exercise between:
"Performance" and "Quality",
"Under-matching" versus "Over-matching",
"Missing the Right data" versus "Finding Wrong data".