Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

Tokenizing Batch Data

Tokenizing Batch Data

The following image shows how you can tokenize data in HDFS and persist the tokenized data in HDFS or in a repository:
Run the MDM - Relate 360 batch jobs to read the input data, add fuzzy tokens to the input data, and then persist the tokenized data in HDFS or in a repository. You can use the tokenized data to search for matching records.
To tokenize the input data, perform the following tasks:
  1. Run the
    Relate 360
    batch jobs to read the input data in HDFS, add tokens to the input data, and persist the tokenized data in HDFS or in a repository.
  2. To search the tokenized data for matching records, use the RESTful web services, and the batch job.

0 COMMENTS

We’d like to hear from you!