Table of Contents

Search

  1. Preface
  2. Command Line Programs and Utilities
  3. Installing and Configuring Command Line Utilities
  4. Using the Command Line Programs
  5. Environment Variables for Command Line Programs
  6. Using infacmd
  7. infacmd as Command Reference
  8. infacmd aud Command Reference
  9. Infacmd bg Command Reference
  10. infacmd cms Command Reference
  11. infacmd dis Command Reference
  12. Infacmd es Command Reference
  13. infacmd ihs Command Reference
  14. infacmd ipc Command Reference
  15. infacmd isp Command Reference
  16. infacmd ldm Command Reference
  17. infacmd mrs Command Reference
  18. infacmd ms Command Reference
  19. infacmd oie Command Reference
  20. infacmd ps Command Reference
  21. infacmd pwx Command Reference
  22. infacmd rms Command Reference
  23. infacmd rtm Command Reference
  24. infacmd sch Command Reference
  25. infacmd search Command Reference
  26. infacmd sql Command Reference
  27. infacmd tdm Command Reference
  28. infacmd wfs Command Reference
  29. infacmd ws Command Reference
  30. infacmd xrf Command Reference
  31. infacmd Control Files
  32. infasetup Command Reference
  33. pmcmd Command Reference
  34. pmrep Command Reference
  35. Working with pmrep Files
  36. Deprecated Commands

Hadoop Connection Options

Hadoop Connection Options

Use connection options to define a Hive connection.
Enter connection options in the following format:
... -o option_name='value' option_name='value' ...
To enter multiple options, separate them with a space.
The following table describes Hadoop connection options for infacmd isp CreateConnection and UpdateConnection commands that you configure when you want to use the Hadoop connection:
Option
Description
connectionId
String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
connectionType
Required. Type of connection is Hadoop.
name
The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
RMAddress
The service within Hadoop that submits requests for resources or spawns YARN applications.
Use the following format:
<hostname>:<port>
Where
  • <hostname> is the host name or IP address of the Yarn resource manager.
  • <port> is the port on which the Yarn resource manager listens for remote procedure calls (RPC).
For example, enter:
myhostame:8032
You can also get the Resource Manager Address property from yarn-site.xml located in the following directory on the Hadoop cluster:
/etc/hadoop/conf/
The Resource Manager Address appears as the following property in yarn-site.xml:
<property> <name>yarn.resourcemanager.address</name> <value>hostname:port</value> <description>The address of the applications manager interface in the Resource Manager.</description> </property>
Optionally, if the
yarn.resourcemanager.address
property is not configured in yarn-site.xml, you can find the host name from the
yarn.resourcemanager.hostname
or
yarn.resourcemanager.scheduler.address
properties in yarn-site.xml. You can then configure the Resource Manager Address in the Hadoop connection with the following value:
hostname:8032
cadiAppYarnQueueName
The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster. The name is case sensitive.
cadiExecutionParameterList
Custom properties that are unique to the Blaze engine.
You can specify multiple properties.
Use the following format:
<property1>=<value>
Where
  • <property1> is a Blaze engine optimization property.
  • <value> is the value of the Blaze engine optimization property.
To specify multiple properties use
&:
as the property separator.
Use custom properties only at the request of Informatica Global Customer Support.
cadiMaxPort
The maximum value for the port number range for the Blaze engine.
cadiMinPort
The minimum value for the port number range for the Blaze engine.
cadiUserName
The operating system profile user name for the Blaze engine.
cadiWorkingDirectory
The HDFS file path of the directory that the Blaze engine uses to store temporary files. Verify that the directory exists. The YARN user, Blaze engine user, and mapping impersonation user must have write permission on this directory.
databaseName
Namespace for tables. Use the name
default
for tables that do not have a specified database name.
defaultFSURI
The URI to access the default Hadoop Distributed File System.
Use the following connection URI:
hdfs://<node name>:<port>
Where
  • <node name> is the host name or IP address of the NameNode.
  • <port> is the port on which the NameNode listens for remote procedure calls (RPC).
For example, enter:
hdfs://myhostname:8020/
You can also get the Default File System URI property from core-site.xml located in the following directory on the Hadoop cluster:
/etc/hadoop/conf/
Use the value from the
fs.defaultFS
property found in core-site.xml.
For example, use the following value:
<property> <name>fs.defaultFS</name> <value>hdfs://localhost:8020</value> </property>
If the Hadoop cluster runs MapR, use the following URI to access the MapR File system:
maprfs:///
.
engineType
The engine that the Hadoop environment uses to run a mapping on the Hadoop cluster. Select a value from the drop down list.
For example select:
MRv2
To set the engine type in the Hadoop connection, you must get the value for the
mapreduce.framework.name
property from mapred-site.xml located in the following directory on the Hadoop cluster:
/etc/hadoop/conf/
If the value for
mapreduce.framework.name
is
classic
, select
mrv1
as the engine type in the Hadoop connection.
If the value for
mapreduce.framework.name
is
yarn
, you can select the
mrv2
or
tez
as the engine type in the Hadoop connection. Do not select Tez if Tez is not configured for the Hadoop cluster.
You can also set the value for the engine type in hive-site.xml. The engine type appears as the following property in hive-site.xml:
<property> <name>hive.execution.engine</name> <value>tez</value> <description>Chooses execution engine. Options are: mr (MapReduce, default) or tez (Hadoop 2 only)</description> </property>
environmentSQL
SQL commands to set the Hadoop environment. The Data Integration Service executes the environment SQL at the beginning of each Hive script generated in a Hive execution plan.
The following rules and guidelines apply to the usage of environment SQL:
  • Use the environment SQL to specify Hive queries.
  • Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. The path must be the fully qualified path to the JAR files used for user-defined functions. Set the parameter hive.aux.jars.path with all the entries in infapdo.aux.jars.path and the path to the JAR files for user-defined functions.
  • You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries.
hadoopExecEnvExecutionParameterList
Custom properties that are unique to the Hadoop environment.
You can specify multiple properties.
Use the following format:
<property1>=<value>
Where
  • <property1> is a Hadoop environment optimization property.
  • <value> is the value of the Hadoop environment optimization property.
To specify multiple properties use
&:
as the property separator.
Use custom properties only at the request of Informatica Global Customer Support.
Hiveserver2Enabled
Optional. Runs the mapping in the HiveServer2 mode. If you enable HiveServer2 execution mode, use the JDBCConnectString option to provide the JDBC URL to access HiveServer2.
The default mode of execution is HiveCLI.
Hiveusername
User name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster.
If the Hadoop cluster uses Kerberos authentication, the principal name for the JDBC connection string and the user name must be the same.
You must use user impersonation for the Hadoop connection if the Hadoop cluster uses Kerberos authentication.
If the Hadoop cluster does not use Kerberos authentication, the user name depends on the behavior of the JDBC driver.
If you do not specify a user name, the Hadoop cluster authenticates jobs based on the operating system profile user name of the machine that runs the Data Integration Service.
hiveWarehouseDirectoryOnHDFS
The absolute HDFS file path of the default database for the warehouse that is local to the cluster. For example, the following file path specifies a local warehouse:
/user/hive/warehouse
For Cloudera CDH, if the Metastore Execution Mode is remote, then the file path must match the file path specified by the Hive Metastore Service on the Hadoop cluster.
You can get the value for the Hive Warehouse Directory on HDFS from the
hive.metastore.warehouse.dir
property in hive-site.xml located in the following directory on the Hadoop cluster:
/etc/hadoop/conf/
For example, use the following value:
<property> <name>hive.metastore.warehouse.dir</name> <value>/usr/hive/warehouse </value> <description>location of the warehouse directory</description> </property>
For MapR,
hive-site.xml
is located in the following direcetory:
/opt/mapr/hive/<hive version>/conf
.
JDBCConnectString
The JDBC connection URI used to access the data and metadata from the Hadoop server.
You can use PowerExchange for Hive to communicate with a HiveServer service or HiveServer2 service.
To connect to HiveServer, specify the connection string in the following format:
jdbc:hive2://<hostname>:<port>/<db>
Where
  • <hostname> is name or IP address of the machine on which HiveServer2 runs.
  • <port> is the port number on which HiveServer2 listens.
  • <db> is the database name to which you want to connect. If you do not provide the database name, the Data Integration Service uses the default database details.
To connect to HiveServer 2, use the connection string format that Apache Hive implements for that specific Hadoop Distribution. For more information about Apache Hive connection string formats, see the Apache Hive documentation.
jobMonitoringURL
The URL for the MapReduce JobHistory server. You can use the URL for the JobTracker URI if you use MapReduce version 1.
Use the following format:
<hostname>:<port>
Where
  • <hostname> is the host name or IP address of the JobHistory server.
  • <port> is the port on which the JobHistory server listens for remote procedure calls (RPC).
For example, enter:
myhostname:8021
You can get the value for the Job Monitoring URL from mapred-site.xml. The Job Monitoring URL appears as the following property in mapred-site.xml:
<property> <name>mapred.job.tracker</name> <value>myhostname:8021 </value> <description>The host and port that the MapReduce job tracker runs at.</description> </property>
metastoreDatabaseDriver
Driver class name for the JDBC data store. For example, the following class name specifies a MySQL driver:
com.mysql.jdbc.Driver
You can get the value for the Metastore Database Driver from hive-site.xml. The Metastore Database Driver appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property>
metastoreDatabasePassword
The password for the metastore user name.
You can get the value for the Metastore Database Password from hive-site.xml. The Metastore Database Password appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> </property>
metastoreDatabaseURI
The JDBC connection URI used to access the data store in a local metastore setup. Use the following connection URI:
jdbc:<datastore type>://<node name>:<port>/<database name>
where
  • <node name> is the host name or IP address of the data store.
  • <data store type> is the type of the data store.
  • <port> is the port on which the data store listens for remote procedure calls (RPC).
  • <database name> is the name of the database.
For example, the following URI specifies a local metastore that uses MySQL as a data store:
jdbc:mysql://hostname23:3306/metastore
You can get the value for the Metastore Database URI from hive-site.xml. The Metastore Database URI appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://MYHOST/metastore</value> </property>
metastoreDatabaseUserName
The metastore database user name.
You can get the value for the Metastore Database User Name from hive-site.xml. The Metastore Database User Name appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> </property>
metastoreMode
Controls whether to connect to a remote metastore or a local metastore. By default, local is selected. For a local metastore, you must specify the Metastore Database URI, Metastore Database Driver, Username, and Password. For a remote metastore, you must specify only the
Remote Metastore URI
.
You can get the value for the Metastore Execution Mode from hive-site.xml. The Metastore Execution Mode appears as the following property in hive-site.xml:
<property> <name>hive.metastore.local</name> <value>true</true> </property>
The
hive.metastore.local
property is deprecated in hive-site.xml for Hive server versions 0.9 and above. If the
hive.metastore.local
property does not exist but the
hive.metastore.uris
property exists, and you know that the Hive server has started, you can set the connection to a remote metastore.
remoteMetastoreURI
The metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you must specify the Thrift server details.
Use the following connection URI:
thrift://<hostname>:<port>
Where
  • <hostname> is name or IP address of the Thrift metastore server.
  • <port> is the port on which the Thrift server is listening.
For example, enter:
thrift://myhostname:9083/
You can get the value for the Remote Metastore URI from hive-site.xml. The Remote Metastore URI appears as the following property in hive-site.xml:
<property> <name>hive.metastore.uris</name> <value>thrift://<n.n.n.n>:9083</value> <description> IP address or fully-qualified domain name and port of the metastore host</description> </property>
SparkHDFSStagingDir
The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs. The YARN user, Spark engine user, and mapping impersonation user must have write permission on this directory.
SparkExecutionParameterList
An optional list of configuration parameters to apply to the Spark engine. You can change the default Spark configuration properties values, such as
spark.executor.memory
or
spark.driver.cores
.
Use the following format:
<property1>=<value>
  • <property1> is a Spark configuration property.
  • <value> is the value of the property.
For example, you can configure a YARN scheduler queue name that specifies available resources on a cluster:
spark.yarn.queue=TestQ
To enter multiple properties, separate each name-value pair with the following text:
&:
stgDataCompressionCodecClass
Codec class name that enables data compression and improves performance on temporary staging tables.
stgDataCompressionCodecType
Hadoop compression library for a compression codec class name.


Updated April 22, 2019