Populations and Controls

Populations and Controls

A Primer on Keys and Search Strategies

A Primer on Keys and Search Strategies

The safest way of finding a name match in a database is to first perform a search on an index built from name alone, thus building a candidate list of possible matches, and then to refine, rank or select the matches in that candidate list based on other identification data.
Name only keys are built from one or more parts of the name field (words & words, words & initials). Of course the method used for constructing the database keys must match the method used for constructing the search keys.
The more name parts used in the key, and the greater the number of keys built per name, the greater the variety of search strategies which can be supported.
A name key for "ANN JACKSON-SMITH" built from family name plus first initial, "SMITH A", can support search strategies using the family name word and initial and also using only the single family word. A name key built from family name and first name, "SMITH ANN" can support search strategies using two words from the name (at the "two word level" or wider). The fewer words used in the key the larger or wider the set of responses will be.
An extra name key, say "JACKSON ANN", supports a search where the search name is missing a certain part or the parts are in a certain different order.
The choice of keys and search strategies together defines the width or depth of the search (by the number of name parts used in the search keys) and the degree of sequence variations and missing parts overcome (by the number of different keys).
The greater the number of name parts used in a search key, the fewer candidates on average will be returned, and the quicker the search. A search strategy which uses the full name makes sense when the name is expected to be generally reliable, when the match is expected to be in the database, or when the search will be stopped, or at least interrupted, at the first match. This type of search strategy is thought of as a Typical search and is used to find data that is expected to be on file.
As confidence in the quality of the search or database names declines, or as the risk of missing a match increases, so will the need for a different search strategy arise. A high-risk search, or a search using poor quality data, should use a wider search strategy to compensate for severe spelling errors and more sequence variations, missing and extra words in the names. This type of search strategy is thought of as an Exhaustive search and is frequently used to prove that data is not on file.
In large scale systems the choice and sophistication of the search strategy is consequential to both performance demands, risk of missing critical data, need to avoid duplication of data and the volume of data under indexing.
The choice of search strategy should match the business needs of the search. The search strategy used for one set of data or one system may be very different from that used in another.
A search strategy is affected by decisions on the following Standard Population components:
  • Key Field - the field to use for indexing and search
  • Key Level - the type of keys built
  • Search Level - the breadth of search performed
Matching, filtering and ranking of the candidates returned from a search is affected by decisions on the following Standard Population components:
  • Match Purpose - the fields used in Matching and the business purpose of the Match
  • Match Level - the degree of Match chosen

0 COMMENTS

We’d like to hear from you!