Relate 360
- Relate 360 10.1
- All Products
run_hbase_region_analysis.sh --config=configuration_file_name --input=input_file_in_HDFS --hdfsdir=working_directory_in_HDFS --rule=matching_rules_file_name --regions=number_of_regions [--reducer=number_of_reducer_jobs] [--outputpath=directory_for_output_files]
Option
| Argument
| Description
|
|---|---|---|
--config
| configuration_file_name
| Absolute path and file name of the configuration file that you create.
|
--input
| input_file_in_HDFS
| Absolute path to the directory that contains tokenized data.
If you run the HDFS tokenization job without the
--outputpath parameter, you can find the tokenized data in the following directory:
<Working Directory in HDFS>/batch-tokenize/<Job ID>/tokenize If you run the HDFS tokenization job with the
--outputpath parameter, you can find the tokenized data in the following directory:
<Output Directory in HDFS>/batch-tokenize/tokenize |
--hdfsdir
| working_directory_in_HDFS
| Absolute path to a working directory in HDFS. The region splitter job uses the working directory to store the library files.
|
--rule
| matching_rules_file_name
| Absolute path and file name of the matching rules file that you create.
The values in the matching rules file override the values in the configuration file.
|
--regions
| number_of_regions
| Number of regions that you want to use for the input data.
The optimal number of regions depends on your environment and resources. For more information about regions and split points, see the repository documentation.
|
--reducer
| number_of_reducer_jobs
| Optional. Number of reducer jobs that you want to run. Default is 1.
|
--outputpath
| directory_for_output_files
| Optional. Absolute path to a directory in HDFS to which the batch job loads the output files. Use a different directory when you rerun the batch job. If you want to use the same directory, delete all the files in the directory and rerun the job. By default, the batch job loads the output files to the working directory in HDFS. |
run_hbase_region_analysis.sh --config=/usr/local/conf/config_big.xml --input=/usr/hdfs/workingdir/MDMBDRMInitialBatch/MDMBDE0063_1602999447744334391/output/dir/pass-join --hdfsdir=/usr/hdfs/workingdir --rule=/usr/local/conf/matching_rules.xml --regions=14