Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

Tokenizing Data and Persisting the Tokenized Data in HDFS

Tokenizing Data and Persisting the Tokenized Data in HDFS

You can create fuzzy tokens for the input data based on the matching rules and persist the tokenized data in HDFS so that you can perform searches on the tokenized data.
The following image shows the batch jobs that you can run to tokenize data and persist the tokenized data in HDFS:
Run the HDFS tokenization job to create fuzzy tokens in HDFS. If you want to search for matching records, run the HDFS batch search job. If you want to delete data from the tokenized data, run the HDFS data deletion job. If you want to add an incremental data to the tokenized data, run the HDFS tokenization job in the incremental mode.
To persist the tokenized data in HDFS, perform the following tasks:
  1. Run the HDFS tokenization job.
    The job creates fuzzy tokens for the input data in HDFS.
  2. To search for records, run the HDFS batch search job.
  3. To add an incremental data to the tokenized data in HDFS, run the HDFS tokenization job in the incremental mode.
  4. To delete records from the tokenized data in HDFS, run the HDFS data deletion job.

0 COMMENTS

We’d like to hear from you!