Search

1. Preface
2. Introduction to Transformations
3. Transformation Ports
4. Transformation Caches
5. Address Validator Transformation
6. Aggregator Transformation
7. Association Transformation
8. Bad Record Exception Transformation
9. Case Converter Transformation
10. Classifier Transformation
11. Comparison Transformation
12. Consolidation Transformation
13. Data Masking Transformation
14. Data Processor Transformation
15. Decision Transformation
16. Duplicate Record Exception Transformation
17. Expression Transformation
18. Filter Transformation
19. Hierarchical to Relational Transformation
20. Java Transformation
21. Java Transformation API Reference
22. Java Expressions
23. Joiner Transformation
24. Key Generator Transformation
25. Labeler Transformation
26. Lookup Transformation
27. Lookup Caches
28. Dynamic Lookup Cache
29. Match Transformation
30. Match Transformations in Field Analysis
31. Match Transformations in Identity Analysis
32. Normalizer Transformation
33. Merge Transformation
34. Parser Transformation
35. Python Transformation
36. Rank Transformation
38. Relational to Hierarchical Transformation
39. REST Web Service Consumer Transformation
40. Router Transformation
41. Sequence Generator Transformation
42. Sorter Transformation
43. SQL Transformation
44. Standardizer Transformation
45. Union Transformation
46. Update Strategy Transformation
47. Web Service Consumer Transformation
48. Parsing Web Service SOAP Messages
49. Generating Web Service SOAP Messages
50. Weighted Average Transformation
51. Window Transformation
52. Write Transformation
53. Appendix A: Transformation Delimiters

Bigram

Use the Bigram algorithm to compare long text strings, such as postal addresses entered in a single field.
The Bigram algorithm calculates a match score for two data strings based on the occurrence of consecutive characters in both strings. The algorithm looks for pairs of consecutive characters that are common to both strings. It divides the number of pairs that match in both strings by the total number of character pairs.

Bigram Example

Consider the following strings:
• larder
• lerder
These strings yield the following Bigram groups:
`l a, a r, r d, d e, e r`
`l e, e r, r d, d e, e r`
Note that the second occurrence of the string "
e r
" within the string "
lerder
" is not matched, as there is no corresponding second occurrence of "
e r
" in the string "
larder
".
To calculate the Bigram match score, the transformation divides the number of matching pairs (6) by the total number of pairs in both strings (10). In this example, the strings are 60% similar and the match score is
0.60
.
Actions
Resources