Relate 360
- Relate 360 10.1
- All Products
run_hiveEnabler.sh --config=configuration_file_name --input=input_file_in_HDFS --hdfsdir=working_directory_in_HDFS --outputtable=output_table_name --hiveserver=Hive_server_host_name:port --skipHBase --forceCopy [--reducer=number_of_reducer_jobs] [--hiveuser=user_name] [--hivepassword=password] [--hivedb=database_name]
Option
| Argument
| Description
|
---|---|---|
--config
| configuration_file_name
| Absolute path and file name of the configuration file that you create.
|
--input
| input_file_in_HDFS
| Absolute path to the directory that contains linked data. If you run the initial clustering job without the
--outputpath parameter, you can find the processed data in the following directory:
<Working Directory in HDFS>/batch-cluster/<Job ID>/output/dir If you run the initial clustering job with the
--outputpath parameter, you can find the processed data in the following directory:
<Output Directory in HDFS>/batch-cluster/output/dir |
--reducer
| number_of_reducer_jobs
| Optional. Number of reducer jobs that you want to run.
|
--hdfsdir
| working_directory_in_HDFS
| Absolute path to a working directory in HDFS.
In a high availability-enabled cluster, prefix the absolute path with the logical URI of the cluster. Use the following format for the logical URI of a cluster: hdfs://<nameservice ID> The Hive enabler job uses the working directory to store the library files.
|
--outputtable
| output_table_name
| Unique name for the output table in Hive to which you want to load the linked data.
|
--hiveserver
| Hive_server_host_name:port
| Host name of the Hive server and the port number on which the Hive server listens.
Use the following format to specify the
--hiveserver parameter:
<Hive Server Host Name>:<Port Number> |
--hivedb
| database_name
| Optional. Name of the Hive database on which you want to create the output table.
If you do not specify the name of the Hive database, the Hive enabler job creates the output table in the default database.
|
--hiveuser
| user_name
| Optional. Name of the user to access the Hive database.
Ensure that the user or the role to which the user belongs is granted the ALL privilege for the Hive database.
|
--hivepassword
| password
| Optional. Password for the user to access the Hive database.
|
--skipHBase
| Indicates that the linked data is in HDFS.
| |
--forceCopy
| Copies the dependent library files to HDFS. Use this option only when you run the Hive enabler job for the first time.
| |
--outputpath
| directory_for_output_files
| Optional. Absolute path to a directory in HDFS to which the batch job loads the output files. Use a different directory when you rerun the batch job. If you want to use the same directory, delete all the files in the directory and rerun the job. In a high availability-enabled cluster, prefix the absolute path with the logical URI of the cluster. Use the following format for the logical URI of a cluster: hdfs://<nameservice ID> By default, the batch job loads the output files to the working directory in HDFS. |
run_hiveEnabler.sh --config=/usr/local/config/Configuration.xml --input=/usr/hdfs/workingdir/batch-cluster/MDMBDRM_931211654144593570/output/dir/pass-join --hdfsdir=hdfs://r360nameservice/usr/hdfs/workingdir --outputtable=HiveOutput --hiveserver=Analytics1:20000 --skipHBase --forceCopy
run_hiveEnabler.sh --config=configuration_file_name --input=input_file_in_HDFS --hdfsdir=working_directory_in_HDFS --outputtable=output_table_name --hiveserver=Hive_server_host_name:port --skipHBase --incremental [--hiveuser=user_name] [--hivepassword=password] [--hivedb=database_name] [--reducer=number_of_reducer_jobs] [--outputpath=directory_for_output_files]
Option
| Argument
| Description
|
---|---|---|
--config
| configuration_file_name
| Absolute path and file name of the configuration file that you create.
|
--input
| input_file_in_HDFS
| Absolute path to the directory that contains linked data. If you run the initial clustering job without the
--outputpath parameter, you can find the processed data in the following directory:
<Working Directory in HDFS>/batch-cluster/<Job ID>/output/dir If you run the initial clustering job with the
--outputpath parameter, you can find the processed data in the following directory:
<Output Directory in HDFS>/batch-cluster/output/dir |
--reducer
| number_of_reducer_jobs
| Optional. Number of reducer jobs that you want to run.
|
--hdfsdir
| working_directory_in_HDFS
| Absolute path to a working directory in HDFS. In a high availability-enabled cluster, prefix the absolute path with the logical URI of the cluster. Use the following format for the logical URI of a cluster: hdfs://<nameservice ID> The Hive enabler job uses the working directory to store the library files.
|
--outputtable
| output_table_name
| Name of the table in Hive that contains the linked data.
|
--hiveserver
| Hive_server_host_name:port
| Host name of the Hive server and the port number on which the Hive server listens.
Use the following format to specify the
--hiveserver parameter:
<Hive Server Host Name>:<Port Number> |
--hivedb
| database_name
| Optional. Name of the Hive database that contains the output table.
If you do not specify the name of the Hive database, the Hive enabler job uses the default database.
|
--hiveuser
| user_name
| Optional. Name of the user to access the Hive database.
Ensure that you grant all the privileges to the user or the role to which the user belongs for the Hive database.
|
--hivepassword
| password
| Optional. Password for the user to access the Hive database.
|
--skipHBase
| Indicates that the linked data is in HDFS.
| |
--incremental
| Runs the Hive enabler job in the incremental mode.
The Hive enabler job updates the output table with the incremental data
.
| |
--outputpath
| directory_for_output_files
| Optional. Absolute path to a directory in HDFS to which the batch job loads the output files. Use a different directory when you rerun the batch job. If you want to use the same directory, delete all the files in the directory and rerun the job. In a high availability-enabled cluster, prefix the absolute path with the logical URI of the cluster. Use the following format for the logical URI of a cluster: hdfs://<nameservice ID> By default, the batch job loads the output files to the working directory in HDFS. |
run_hiveEnabler.sh --config=/usr/local/config/Configuration.xml --input=/usr/hdfs/workingdir/batch-cluster/MDMBDRM_931211654144593970/output/dir/pass-join --hdfsdir=/usr/hdfs/workingdir --outputtable=HiveOutput --hiveserver=Analytics1:20000 --skipHBase --incremental