Table of Contents

Search

  1. Preface
  2. Introduction to Reference Data
  3. Reference Tables in the Analyst Tool
  4. Reference Data in the Developer Tool
  5. Classifier Models
  6. Probabilistic Models
  7. Reference Data and Informatica Big Data Management

Reference Data Guide

Reference Data Guide

Reference Data for Address Validation

Reference Data for Address Validation

When you run an address validation mapping in a Hadoop environment, the address reference data files must reside on each DataNode on which the mapping runs. Informatica Big Data Management installs with a shell script that you can use to install the files on the DataNodes.
Use the shell script to install the address reference data files on the DataNodes in a single operation. The script reads a file that contains the names or IP addresses of the nodes. The script copies the address reference data files to each node that the file identifies.
The script name is
copyRefDataToComputeNodes.sh
.
Find the script in the following directory in the Informatica Big Data Management installation:
<Informatica installation directory>/tools/dq/av
The following table describes the options that the script uses:
Option
Description
-n
The file that contains the list of names or IP addresses of the DataNodes in the Hadoop cluster. Enter each node name or IP address on a separate line in the file.
By default, the script reads the file from the
$BASEDIR/HadoopDataNodes
directory, where
$BASEDIR
is the location of the shell script.
-p
A prompt to confirm that you want to install the address reference data files.
By default, the script displays a prompt to confirm that you want to copy the files from the source directory to the target directories on the DataNodes. if you run the shell script on a schedule, you can disable the prompt.
The default option value is Y. To disable the prompt, set the value to N.
-s
The source directory for the address reference data files that the script copies to the nodes.
By default, the script reads the files from the
/reference_data
directory on the local machine.
Address reference data files use the file name extension .MD. The source directory must contain the address reference data files and no other files.
-t
The directory on each node to which the script copies the address reference data files.
By default, the script copies the files to the
/reference_data
directory on each node.
-u
The user name of the user who runs the script. The user must have passwordless secure shell access to the nodes.

0 COMMENTS

We’d like to hear from you!