Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

Running the Hive Enabler Job

Running the Hive Enabler Job

Run the Hive enabler job to load the linked data into Hive or to link a Hive table to a repository table.
To run the Hive enabler job, use the
run_hiveEnabler.sh
script located in the following directory:
/usr/local/mdmbdrm-<Version Number>

Loading Linked Data into Hive from the Repository

Use the following command to run the
run_hiveEnabler.sh
script without the link option:
run_hiveEnabler.sh --config=configuration_file_name --input=input_file_in_HDFS --hdfsdir=working_directory_in_HDFS --outputtable=output_table_name --hiveserver=Hive_server_host_name:port --forceCopy [--reducer=number_of_reducer_jobs] [--hiveuser=user_name] [--hivepassword=password] [--hivedb=database_name] [--outputpath=directory_for_output_files]
The following table describes the options and arguments that you can specify to run the
run_hiveEnabler.sh
script:
Option
Argument
Description
--config
configuration_file_name
Absolute path and file name of the configuration file that you create.
--input
input_file_in_HDFS
Absolute path to the input files in HDFS.
--reducer
number_of_reducer_jobs
Optional. Number of reducer jobs that you want to run.
--hdfsdir
working_directory_in_HDFS
Absolute path to a working directory in HDFS.
In a high availability-enabled cluster, prefix the absolute path with the logical URI of the cluster.
Use the following format for the logical URI of a cluster:
hdfs://<nameservice ID>
The Hive enabler job uses the working directory to store the library files.
--outputtable
output_table_name
Unique name for the output table in Hive to which you want to load the linked data.
--hiveserver
Hive_server_host_name:port
Host name of the Hive server and the port number on which the Hive server listens.
Use the following format to specify the
--hiveserver
parameter:
<Hive Server Host Name>:<Port Number>
--hivedb
database_name
Optional. Name of the Hive database on which you want to create the output table.
If you do not specify the name of the Hive database, the Hive enabler job creates the output table in the default database.
--hiveuser
user_name
Optional. Name of the user to access the Hive database.
Ensure that the user or the role to which the user belongs is granted the ALL privilege for the Hive database.
--hivepassword
password
Optional. Password for the user to access the Hive database.
--forceCopy
Copies the dependent library files to HDFS. Use this option only when you run the Hive enabler job for the first time.
--outputpath
directory_for_output_files
Optional. Absolute path to a directory in HDFS to which the batch job loads the output files. Use a different directory when you rerun the batch job. If you want to use the same directory, delete all the files in the directory and rerun the job.
In a high availability-enabled cluster, prefix the absolute path with the logical URI of the cluster.
Use the following format for the logical URI of a cluster:
hdfs://<nameservice ID>
By default, the batch job loads the output files to the working directory in HDFS.
For example, the following command loads the linked data into Hive:
run_hiveEnabler.sh --config=/usr/local/config/Configuration.xml --input=/usr/hdfs/Source --hdfsdir=hdfs://R360nameservice/usr/hdfs/Hive --outputtable=HiveOutput --hiveserver=Analytics1:20000 --forceCopy

Linking a Hive Table to the Repository Table

Use the following command to run the
run_hiveEnabler.sh
script with the link option:
run_hiveEnabler.sh --config=configuration_file_name --linkHBase --outputtable=output_table_name --hiveserver=Hive_server_host_name:port --forceCopy [--reducer=number_of_reducer_jobs] [--hiveuser=user_name] [--hivepassword=password] [--hivedb=database_name]
The following table describes the options and arguments that you can specify to run the
run_hiveEnabler.sh
script:
Option
Argument
Description
--config
configuration_file_name
Absolute path and file name of the configuration file that you create.
--reducer
number_of_reducer_jobs
Optional. Number of reducer jobs that you want to run.
--outputtable
output_table_name
Unique name for the output table in Hive that you want to link to the repository table.
--hiveserver
Hive_server_host_name:port
Host name of the Hive server and the port number on which the Hive server listens.
Use the following format to specify the
--hiveserver
parameter:
<Hive Server Host Name>:<Port Number>
--hivedb
database_name
Optional. Name of the Hive database on which you want to create the output table.
If you do not specify the name of the Hive database, the Hive enabler job creates the output table in the default database.
--hiveuser
user_name
Optional. Name of the user to access the Hive database.
Ensure that you grant all the privileges to the user or the role to which the user belongs for the Hive database.
--hivepassword
password
Optional. Password for the user to access the Hive database.
--linkHBase
Links the output table in Hive to the repository table that contains the linked data.
--forceCopy
Copies the dependent library files to HDFS. Use this option only when you run the Hive enabler job for the first time.
For example, the following command links the Hive table to the repository table:
run_hiveEnabler.sh --config=/usr/local/config/Configuration.xml --linkHBase --outputtable=HiveOutput --hiveserver=Analytics1:20000 --forceCopy

0 COMMENTS

We’d like to hear from you!