Relate 360
- Relate 360 10.1
- All Products
run_tokenloader.sh --config=configuration_file_name --input=input_file_in_HDFS --hdfsdir=working_directory_in_HDFS --rule=matching_rules_file_name --tmpdir=temporary_working_directory [--outputpath=directory_for_output_files] [--reducer=number_of_reducer_jobs]
Option
| Argument
| Description
|
|---|---|---|
--config
| configuration_file_name
| Absolute path and file name of the configuration file that you create.
In the configuration file, if you set the
StoreAllFields parameter to false, the repository does not persist all the columns but persists only the columns that you use to index the input data. If you want to persist all the columns in the repository, ensure that you set the
StoreAllFields parameter to true in the configuration file before you tokenize the input data.
|
--input
| input_file_in_HDFS
| Absolute path to the input files in HDFS.
|
--reducer
| number_of_reducer_jobs
| Optional. Number of reducer jobs that you want to run. Default is 1.
|
--hdfsdir
| working_directory_in_HDFS
| Absolute path to a working directory in HDFS. The repository tokenization process uses the working directory to store the library files.
|
--rule
| matching_rules_file_name
| Absolute path and file name of the matching rules file that you create.
The values in the matching rules file override the values in the configuration file.
|
--tmpdir
| temporary_working_directory
| Absolute path to a temporary directory to which you have write permission in the local file system.
The repository tokenization job uses the directory to store the intermediate files.
|
--outputpath
| directory_for_output_files
| Optional. Absolute path to a directory in HDFS to which the batch job loads the output files. Use a different directory when you rerun the batch job. If you want to use the same directory, delete all the files in the directory and rerun the job. By default, the batch job loads the output files to the working directory in HDFS. |
run_tokenloader.sh --config=/usr/local/conf/config_big.xml --input=/usr/hdfs/GenerateTokens --reducer=16 --hdfsdir=/usr/hdfs/workingdir --rule=/usr/local/conf/matching_rules.xml --tmpdir=/tmp
ROW COLUMN+CELL 00KCKSHX$$ SALESFORCE0000000 column=aml_link_columns:CLUSTERNUMBER, timestamp=1454406691384, value=f9febe82-8f55-4b7e-98de-a11290ae2807 066 00KCKSHX$$ SALESFORCE0000000 column=aml_link_columns:LMT_MATCHED_PK, timestamp=1454406691384, value= 066 00KCKSHX$$ SALESFORCE0000000 column=aml_link_columns:LMT_MATCHED_RECORD_SOURCE, timestamp=1454406691384, value= 066 00KCKSHX$$ SALESFORCE0000000 column=aml_link_columns:LMT_MATCHED_SCORE, timestamp=1454406691384, value=0 066 00KCKSHX$$ SALESFORCE0000000 column=aml_link_columns:LMT_SOURCE_NAME, timestamp=1454406691384, value= 066 00KCKSHX$$ SALESFORCE0000000 column=aml_link_columns:NAME, timestamp=1454406691384, value=Abbott Laboratories 066 00KCKSHX$$ SALESFORCE0000000 column=aml_link_columns:ROWID, timestamp=1454406691384, value=0000000066 066