Identical Names and the Use of Other Identifying Data
There will be many cases in systems where identical names occur for different people, for example,
By chance in a large population.
On purpose for reasons of fraud.
By design where individuals are named after parents, close relatives or famous people.
When identical name data occurs for different individuals or organizations it will be necessary to use other data such as address, date of birth & gender in the case of persons, to distinguish the entity.
Additional identifying data can be incorporated into the search in several ways:
Selecting all records using the SSA-NAME3 Name-keys and displaying the identifying data for all candidate matches. This gives the choice to the user, but can be confusing if there are too many candidates.
Asking the user to input other data and using this to ’ignore’ records found in the search via the SSA-NAME3 Matching routines. This allows a few matching candidates to be displayed and allows some room for error in the other identifying data.
Asking the user to input other data and using a composite key of Name-key and other data to only access these records. This also provides quicker access, allows a few matching candidates to be displayed, but does not allow any room for variation in the other identifying data.
More confusing is where twins of the same gender have been given the same names, and in other rare cases where two people have the same name and other identifying data although they are not related.
The system solution to these problems has always been to let the confusion continue from real life into the system BUT to design files such that as soon as the situation is detected, the records are marked to say ’CONFUSION EXISTS’.
Periodic Analyzing of Low Frequency Names
The use of periodic histograms of low frequency name word occurrences can be of value. We have certain systems where periodically the file is analyzed for all new low frequency entries. These entries are then manually checked for evidence of error. In high volume systems a low frequency name word has a very high probability of being an error.
Recognizing the Language of the Name
Techniques that recognize the language of the name turn out to be in general of little value. The reason for this is that the names of second generation migrant populations, and names resulting from ’Anglicization’, often contain words from several languages as well as hybrid forms. Such techniques may work with meaningful text but have little benefit with names.
Keeping Name Variations
If the reliability of your name search is critical, there is one golden rule that is independent of the Name search method being used:
Do Not Throw Name Variations Away, Always Keep All Reported Variations For A Person And Have Them As Access Paths Using The Algorithm.
This is expensive and not obvious. If name variations (A, B, C) have been grouped as a set or person, and name variations (D, E, F) have also been grouped, it is always possible that a name variation (X) can be encountered that will group a set (A, X, F). In this case, it is logically true that (A, B, C, D, E, F, X) are all one set according to the system. If, at this or prior points in time, only the current or latest name variation is kept as accessible by the Name search method, then reliability is compromised.
Aliases
In many applications a person may be known by many aliases. The system should allow for retrieval by any of the known aliases.
This can be achieved in two main ways:
By creating a record in the system for each name and connecting the records together by an identifier, such as customer number, and maintaining one set of Name-keys per record.
By maintaining many names in one record and maintaining multiple Name-keys in that record.
Which approach is taken will depend on the ability of your file structure to handle each of these options, the number of times all other names are required if a person is found by a name, and the cost of maintaining the repeating group versus the cost of maintaining records.
Avoid Exact Name-keys
While it may be true that exact Name-keys will succeed some 80% of the time it is true that the SSA-NAME3 key can succeed approximately 98% of the time for a very small increase in I/O and little overall change in response time.
It is also true that any system that shows exact matches in preference to first depth fuzzy matches increases the number of duplicate records on its system. Operator errors that introduce duplicates have only a very small chance of being detected if only exact match is supported.
The use of exact match facilities in the past has been a necessary evil arising out of performance problems. It should no longer be required. In our experience adding records to a screen has not slowed operators down if the first depth records are displayed first, and especially if they are ranked by other identity data.
Mixed People and Company Names
Many organizations have classified names as personal or commercial. This leads to separate key building algorithms and files.
The incidence of search situations where J.A.Jones Inc. should be used to match John Jones or Jones & Sons is very much higher than anticipated. If it is true that both businesses and people can be the subject of one system, or the same logical entity in a data base, use one common SSA-NAME3 Algorithm. Such matching will be achieved automatically.
It may still be valuable to classify records but do not use different files or keys unless they are truly different entities. In fact the MARK feature of the Edit-list may help with automatic classifications.
When people and companies do not mix (e.g. Workers Compensation Insurance: Employers and Employees where employees are always people) it may be optimum to generate one Algorithm for people and one for companies (or as in the example: one for people/employees and one for employers).
When people and company names are mixed then the choice of "left" or "right" dominance and the choice to use alias or multiple keys is very important.
For more information on handling mixed names, refer to the Algorithm Definition/Tips on Customizing an Algorithm section in the
DEFINITION and CUSTOMIZATION GUIDE FOR SSA-NAME3 SERVICE GROUPS
.
Treating Special Cases
Great care should be taken with treatment of special cases that are annoying to users. In many cases the records displayed that annoy the user are not in fact getting in the way of their job while their elimination can do damage elsewhere. In many other cases there is an immediately obvious dilemma when both problems are shown to the user.
OFTEN THERE IS NO SOLUTION TO A SPECIAL CASE.
If the case is really significant in volume then special treatment should be considered even if there is a penalty elsewhere. Doing it because it is possible or because the user wants it can have a real long term quality and performance impact.
The following paragraphs discuss some examples but you are recommended to talk to Informatica Corporation about such concerns before addressing them.
John King and Johnny Chang
These would match and in most populations will simply be a small penalty case annoyingly obvious to people but not to the Algorithm. Fix it and probably Kentish and Chentish will not match or even Kant and Cant will no longer match.
J O’Grady and J Ogrady and J O Grady
This cannot be consistently solved ( even with dictionaries of Irish names) but would be usually obvious to an operator, so don’t try. Without exhaustive study and testing the case of the names like Oarr, Oberbeck, Oberg which could easily be keyed O’Berg would never be tested against your improvement.
David Desmond-Brown
This is displayed with the David Browns. The treatment of hyphens is again a defensive compromise, they are treated as a delimiter. If you change the rule to either make the family name Desmondbrown or Desmond (as is done in many systems) then the frequently occurring case David Desmond Brown or D D Brown will not match. It is also true that operators can always see the opportunity to split words to repeat a search but seldom to combine them.