Use the Bigram algorithm to compare long text strings, such as postal addresses entered in a single field.
The Bigram algorithm calculates a match score for two data strings based on the occurrence of consecutive characters in both strings. The algorithm looks for pairs of consecutive characters that are common to both strings. It divides the number of pairs that match in both strings by the total number of character pairs.
Bigram Example
Consider the following strings:
larder
lerder
These strings yield the following Bigram groups:
l a, a r, r d, d e, e r
l e, e r, r d, d e, e r
Note that the second occurrence of the string "
e r
" within the string "
lerder
" is not matched, as there is no corresponding second occurrence of "
e r
" in the string "
larder
".
To calculate the Bigram match score, the transformation divides the number of matching pairs (6) by the total number of pairs in both strings (10). In this example, the strings are 60% similar and the match score is