User Guide

Back Next

Repository Batch Search Job

The repository batch search job identifies the matching records for the input data in the repository based on the match tokens. The repository batch search job reads the input data in HDFS and creates the output files that contain the matching records for the input data in HDFS.

The repository batch search job requires the repository to contain all the columns with the match tokens. You must set the

StoreAllFields

parameter to true in the configuration file when you tokenize the input data to include all the columns.

The following image shows how the repository batch search job searches for the matching records in the repository: