Table of Contents

Search

  1. Preface
  2. Command Line Programs and Utilities
  3. Installing and Configuring Command Line Utilities
  4. Using the Command Line Programs
  5. Environment Variables for Command Line Programs
  6. Using infacmd
  7. infacmd as Command Reference
  8. infacmd aud Command Reference
  9. infacmd autotune Command Reference
  10. Infacmd bg Command Reference
  11. infacmd ccps Command Reference
  12. infacmd cluster Command Reference
  13. infacmd cms Command Reference
  14. infacmd dis Command Reference
  15. Infacmd dis Queries
  16. infacmd dp Command Reference
  17. infacmd idp Command Reference
  18. infacmd edp Command Reference
  19. Infacmd es Command Reference
  20. infacmd ics Command Reference
  21. infacmd ipc Command Reference
  22. infacmd isp Command Reference
  23. infacmd ldm Command Reference
  24. infacmd mas Command Reference
  25. infacmd mi Command Reference
  26. infacmd mrs Command Reference
  27. infacmd ms Command Reference
  28. infacmd oie Command Reference
  29. infacmd ps Command Reference
  30. infacmd pwx Command Reference
  31. infacmd roh Command Reference
  32. infacmd rms Command Reference
  33. infacmd rtm Command Reference
  34. infacmd sch Command Reference
  35. infacmd search Command Reference
  36. infacmd sql Command Reference
  37. infacmd tdm Command Reference
  38. infacmd tools Command Reference
  39. infacmd wfs Command Reference
  40. infacmd ws Command Reference
  41. infacmd xrf Command Reference
  42. infacmd Control Files
  43. infasetup Command Reference
  44. pmcmd Command Reference
  45. pmrep Command Reference
  46. Working with filemanager
  47. Working with pmrep Files

Hadoop Connection Options

Hadoop Connection Options

Use connection options to define a Hadoop connection.
Enter connection options in the following format:
... -o option_name='value' option_name='value' ...
To enter multiple options, separate them with a space.
To enter advanced properties, use the following format:
... -o engine_nameAdvancedProperties="'advanced.property.name=value'"
For example:
... -o blazeAdvancedProperties="'infrgrid.orchestrator.svc.sunset.time=3'"
The following table describes Hadoop connection options for infacmd isp CreateConnection and UpdateConnection commands that you configure when you want to use the Hadoop connection:
Option
Description
connectionId
String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
connectionType
Required. Type of connection is Hadoop.
name
The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
blazeJobMonitorURL
The host name and port number for the Blaze Job Monitor.
Use the following format:
<hostname>:<port>
Where
  • <hostname> is the host name or IP address of the Blaze Job Monitor server.
  • <port> is the port on which the Blaze Job Monitor listens for remote procedure calls (RPC).
For example, enter:
myhostname:9080
blazeYarnQueueName
The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster. The name is case sensitive.
blazeAdvancedProperties
Advanced properties that are unique to the Blaze engine.
To enter multiple properties, separate each name-value pair with the following text:
&:
.
Use Informatica custom properties only at the request of Informatica Global Customer Support.
blazeMaxPort
The maximum value for the port number range for the Blaze engine.
Default value is 12600
blazeMinPort
The minimum value for the port number range for the Blaze engine.
Default value is 12300
blazeUserName
The owner of the Blaze service and Blaze service logs.
When the Hadoop cluster uses Kerberos authentication, the default user is the Data Integration Service SPN user. When the Hadoop cluster does not use Kerberos authentication and the Blaze user is not configured, the default user is the Data Integration Service user.
blazeStagingDirectory
The HDFS file path of the directory that the Blaze engine uses to store temporary files. Verify that the directory exists. The YARN user, Blaze engine user, and mapping impersonation user must have write permission on this directory.
Default is
/blaze/workdir
. If you clear this property, the staging files are written to the Hadoop staging directory
/tmp/blaze_<user name>
.
clusterConfigId
The cluster configuration ID associated with the Hadoop cluster. You must enter a configuration ID to set up a Hadoop connection.
hiveStagingDatabaseName
Namespace for Hive staging tables. Use the name
default
for tables that do not have a specified database name.
engineType
Execution engine to run HiveServer2 tasks on the Spark engine. Default is MRv2.
You can choose MRv2 or Tez according to the engine type that the Hadoop distribution uses:
  • Amazon EMR. Tez
  • Azure HDI. Tez
  • Cloudera CDH. MRv2
  • Cloudera CDP. Tez
  • Dataproc. MRv2
  • Hortonworks HDP. Tez
  • MapR. MRv2
environmentSQL
SQL commands to set the Hadoop environment. The Data Integration Service executes the environment SQL at the beginning of each Hive script generated in a Hive execution plan.
The following rules and guidelines apply to the usage of environment SQL:
  • Use the environment SQL to specify Hive queries.
  • Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. If you use Hive user-defined functions, you must copy the .jar files to the following directory:
    <Informatica installation directory>/services/shared/hadoop/<Hadoop distribution name>/extras/hive-auxjars
  • You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries.
hadoopExecEnvExecutionParameterList
Custom properties that are unique to the Hadoop connection.
You can specify multiple properties.
Use the following format:
<property1>=<value>
To specify multiple properties use
&:
as the property separator.
If more than one Hadoop connection is associated with the same cluster configuration, you can override configuration set property values.
Use Informatica custom properties only at the request of Informatica Global Customer Support.
hadoopRejDir
The remote directory where the Data Integration Service moves reject files when you run mappings.
Enable the reject directory using rejDirOnHadoop.
impersonationUserName
Required if the Hadoop cluster uses Kerberos authentication. Hadoop impersonation user. The user name that the Data Integration Service impersonates to run mappings in the Hadoop environment.
The Data Integration Service runs mappings based on the user that is configured. Refer the following order to determine which user the Data Integration Services uses to run mappings:
  1. Operating system profile user. The mapping runs with the operating system profile user if the profile user is configured. If there is no operating system profile user, the mapping runs with the Hadoop impersonation user.
  2. Hadoop impersonation user. The mapping runs with the Hadoop impersonation user if the operating system profile user is not configured. If the Hadoop impersonation user is not configured, the Data Integration Service runs mappings with the Data Integration Service user.
  3. Data Integration Service user. The mapping runs with the Data Integration Service user if the operating system profile user and the Hadoop impersonation user are not configured.
hiveWarehouseDirectoryOnHDFS
Optional. The absolute HDFS file path of the default database for the warehouse that is local to the cluster.
If you do not configure the Hive warehouse directory, the Hive engine first tries to write to the directory specified in the cluster configuration property
hive.metastore.warehouse.dir
. If the cluster configuration does not have the property, the Hive engine writes to the default directory
/user/hive/warehouse
.
metastoreDatabaseDriver
Driver class name for the JDBC data store. For example, the following class name specifies a MySQL driver:
com.mysql.jdbc.Driver
You can get the value for the Metastore Database Driver from hive-site.xml. The Metastore Database Driver appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property>
metastoreDatabasePassword
The password for the metastore user name.
You can get the value for the Metastore Database Password from hive-site.xml. The Metastore Database Password appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> </property>
metastoreDatabaseURI
The JDBC connection URI used to access the data store in a local metastore setup. Use the following connection URI:
jdbc:<datastore type>://<node name>:<port>/<database name>
where
  • <node name> is the host name or IP address of the data store.
  • <data store type> is the type of the data store.
  • <port> is the port on which the data store listens for remote procedure calls (RPC).
  • <database name> is the name of the database.
For example, the following URI specifies a local metastore that uses MySQL as a data store:
jdbc:mysql://hostname23:3306/metastore
You can get the value for the Metastore Database URI from hive-site.xml. The Metastore Database URI appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://MYHOST/metastore</value> </property>
metastoreDatabaseUserName
The metastore database user name.
You can get the value for the Metastore Database User Name from hive-site.xml. The Metastore Database User Name appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> </property>
metastoreMode
Controls whether to connect to a remote metastore or a local metastore. By default, local is selected. For a local metastore, you must specify the Metastore Database URI, Metastore Database Driver, Username, and Password. For a remote metastore, you must specify only the
Remote Metastore URI
.
You can get the value for the Metastore Execution Mode from hive-site.xml. The Metastore Execution Mode appears as the following property in hive-site.xml:
<property> <name>hive.metastore.local</name> <value>true</true> </property>
The
hive.metastore.local
property is deprecated in hive-site.xml for Hive server versions 0.9 and above. If the
hive.metastore.local
property does not exist but the
hive.metastore.uris
property exists, and you know that the Hive server has started, you can set the connection to a remote metastore.
remoteMetastoreURI
The metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you must specify the Thrift server details.
Use the following connection URI:
thrift://<hostname>:<port>
Where
  • <hostname> is name or IP address of the Thrift metastore server.
  • <port> is the port on which the Thrift server is listening.
For example, enter:
thrift://myhostname:9083/
You can get the value for the Remote Metastore URI from hive-site.xml. The Remote Metastore URI appears as the following property in hive-site.xml:
<property> <name>hive.metastore.uris</name> <value>thrift://<n.n.n.n>:9083</value> <description> IP address or fully-qualified domain name and port of the metastore host</description> </property>
rejDirOnHadoop
Enables hadoopRejDir. Used to specify a location to move reject files when you run mappings.
If enabled, the Data Integration Service moves mapping files to the HDFS location listed in hadoopRejDir.
By default, the Data Integration Service stores the mapping files based on the RejectDir system parameter.
sparkEventLogDir
Optional. The HDFS file path of the directory that the Spark engine uses to log events.
sparkAdvancedProperties
Advanced properties that are unique to the Spark engine.
To enter multiple properties, separate each name-value pair with the following text:
&:
.
Use Informatica custom properties only at the request of Informatica Global Customer Support.
sparkStagingDirectory
The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs. The YARN user, Data Integration Service user, and mapping impersonation user must have write permission on this directory.
By default, the temporary files are written to the Hadoop staging directory
/tmp/spark_<user name>
.
sparkYarnQueueName
The YARN scheduler queue name used by the Spark engine that specifies available resources on a cluster. The name is case sensitive.
stgDataCompressionCodecClass
Codec class name that enables data compression and improves performance on temporary staging tables. The codec class name corresponds to the code type.
stgDataCompressionCodecType
Hadoop compression library for a compression codec class name.
You can choose None, Zlib, Gzip, Snappy, Bz2, LZO, or Custom.
Default is None.

0 COMMENTS

We’d like to hear from you!