Search

1. Preface
2. Introduction to Transformations
3. Transformation Ports
4. Transformation Caches
6. Aggregator Transformation
7. Association Transformation
9. Case Converter Transformation
10. Classifier Transformation
11. Comparison Transformation
12. Consolidation Transformation
14. Data Processor Transformation
15. Decision Transformation
16. Duplicate Record Exception Transformation
17. Expression Transformation
18. Filter Transformation
19. Hierarchical to Relational Transformation
20. Java Transformation
21. Java Transformation API Reference
22. Java Expressions
23. Joiner Transformation
24. Key Generator Transformation
25. Labeler Transformation
26. Lookup Transformation
27. Lookup Caches
28. Dynamic Lookup Cache
29. Macro Transformation
30. Match Transformation
31. Match Transformations in Field Analysis
32. Match Transformations in Identity Analysis
33. Normalizer Transformation
34. Merge Transformation
35. Parser Transformation
36. Python Transformation
37. Rank Transformation
39. Relational to Hierarchical Transformation
40. REST Web Service Consumer Transformation
41. Router Transformation
42. Sequence Generator Transformation
43. Sorter Transformation
44. SQL Transformation
45. Standardizer Transformation
46. Union Transformation
47. Update Strategy Transformation
48. Web Service Consumer Transformation
49. Parsing Web Service SOAP Messages
50. Generating Web Service SOAP Messages
51. Weighted Average Transformation
52. Window Transformation
53. Write Transformation
54. Appendix A: Transformation Delimiters

Bigram

Use the Bigram algorithm to compare long text strings, such as postal addresses entered in a single field.
The Bigram algorithm calculates a match score for two data strings based on the occurrence of consecutive characters in both strings. The algorithm looks for pairs of consecutive characters that are common to both strings. It divides the number of pairs that match in both strings by the total number of character pairs.

Bigram Example

Consider the following strings:
• larder
• lerder
These strings yield the following Bigram groups:
`l a, a r, r d, d e, e r`
`l e, e r, r d, d e, e r`
Note that the second occurrence of the string "
e r
" within the string "
lerder
" is not matched, as there is no corresponding second occurrence of "
e r
" in the string "
larder
".
To calculate the Bigram match score, the transformation divides the number of matching pairs (6) by the total number of pairs in both strings (10). In this example, the strings are 60% similar and the match score is
0.60
.
Actions
Resources