Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

Repository Batch Search Job

Repository Batch Search Job

The repository batch search job identifies the matching records for the input data in the repository based on the match tokens. The repository batch search job reads the input data in HDFS and creates the output files that contain the matching records for the input data in HDFS.
The repository batch search job requires the repository to contain all the columns with the match tokens. You must set the
StoreAllFields
parameter to true in the configuration file when you tokenize the input data to include all the columns.
The following image shows how the repository batch search job searches for the matching records in the repository:
The repository batch search job identifies the matching records for the input data in the repository and writes the matching records to the output files in HDFS.
When you run the repository batch search job, the job performs the following tasks:
  1. Reads the input files in HDFS.
  2. Compares the input data against the tokenized data in the repository based on the match tokens.
  3. Writes the matching records for the input data to the output files in HDFS.
    The number of output files depends on the number of reducers that you run.

0 COMMENTS

We’d like to hear from you!