Tokenize Process

Tokenize Process

The
tokenize process
generates match tokens that are used subsequently by the match process to identify candidate base object records for matching.
Match tokens
are strings that represent both encoded (match key) and unencoded (raw) values in the match columns of the base object.
Match keys
are fixed-length, compressed, and encoded values, built from a combination of the words and numbers in a name or address, such that relevant variations have the same match key value.
The generated match tokens are stored in a
match key table
associated with the base object. For each record in the base object, the tokenize process stores one or more records containing generated match tokens in the match key table. The match process depends on current data in the match key table, and will run the tokenize process automatically if match tokens have not been generated for any of the records in the base object. The tokenize process can be run before the match process, automatically at the end of the load process, or manually, as a batch job or stored procedure.
The Hub Console allows users to investigate the distribution of match keys in the match key table. Users can identify potential
hot spots
in their data (high concentrations of match keys that could result in
overmatching)
where the match process generates too many matches, including matches that are not relevant.

0 COMMENTS

We’d like to hear from you!