User Guide

Back Next

Running the Repository Update Job

The repository update job updates the repository data with the input data and creates match tokens for the input data in the repository.

To run the repository update job, run the

run_updatesync.sh

script located in the following directory:

/usr/local/mdmbdrm-<Version Number>

Use the following command to run the

run_updatesync.sh

script:

run_updatesync.sh

--config=configuration_file_name

--input=input_file_in_HDFS

--hdfsdir=working_directory_in_HDFS

--rule=matching_rules_file_name

[--outputpath=directory_for_output_files]

[--reducer=number_of_reducer_jobs]

The following table describes the options and arguments that you can specify to run the

run_updatesync.sh

script:

Option	Argument	Description
--config	configuration_file_name	Absolute path and file name of the configuration file that you create.
--input	input_file_in_HDFS	Absolute path to the input files in HDFS.
--reducer	number_of_reducer_jobs	Optional. Number of reducer jobs that you want to run to update the repository. Default is 1.
--hdfsdir	working_directory_in_HDFS	Absolute path to a working directory in HDFS. The repository update job uses the working directory to store the library files.
--rule	matching_rules_file_name	Absolute path and file name of the matching rules file that you create. The values in the matching rules file override the values in the configuration file.
--outputpath	directory_for_output_files	Optional. Absolute path to a directory in HDFS to which the batch job loads the output files. Use a different directory when you rerun the batch job. If you want to use the same directory, delete all the files in the directory and rerun the job. By default, the batch job loads the output files to the working directory in HDFS.

For example, the following command runs the repository update job:

run_updatesync.sh --config=/usr/local/conf/config_big.xml --input=/usr/hdfs/IncrementalData --reducer=16 --hdfsdir=/usr/hdfs/workingdir --rule=/usr/local/conf/matching_rules.xml

Rename Saved Search

Table of Contents

User Guide

User Guide

Running the Repository Update Job

Running the Repository Update Job