Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

HDFS Batch Search Job

HDFS Batch Search Job

The HDFS batch search job identifies the matching records for the input data in the output files of a HDFS tokenization job. The HDFS batch search job reads the input data in HDFS and creates the output files that contain the matching records for the input data in HDFS.
The following image shows how the HDFS batch search job searches for the matching records:
The HDFS batch search job identifies the matching records for the input data in the tokenized data and writes the matching records to the output files in HDFS.
When you run the HDFS batch search job, the job performs the following tasks:
  1. Reads the input files in HDFS.
  2. Compares the input data against the tokenized data that a HDFS tokenization job creates.
  3. Writes the matching records for the input data to the output files in HDFS.
    The number of output files depends on the number of reducers that you run.

0 COMMENTS

We’d like to hear from you!