Common Content for Data Engineering
- Common Content for Data Engineering 10.5.3
- All Products
To enter multiple options, separate them with a space.... -o option_name='value' option_name='value' ...
For example:... -o engine_nameAdvancedProperties="'advanced.property.name=value'"
... -o blazeAdvancedProperties="'infrgrid.orchestrator.svc.sunset.time=3'"
Option
| Description
|
---|---|
connectionId
| String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
|
connectionType
| Required. Type of connection is Hadoop.
|
name
| The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
|
blazeJobMonitorURL
| The host name and port number for the Blaze Job Monitor.
Use the following format:
<hostname>:<port>
Where
For example, enter:
myhostname:9080
|
blazeYarnQueueName
| The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster. The name is case sensitive.
|
blazeAdvancedProperties
| Advanced properties that are unique to the Blaze engine.
To enter multiple properties, separate each name-value pair with the following text:
&: .
Use Informatica custom properties only at the request of Informatica Global Customer Support.
|
blazeMaxPort
| The maximum value for the port number range for the Blaze engine.
Default value is 12600
|
blazeMinPort
| The minimum value for the port number range for the Blaze engine.
Default value is 12300
|
blazeUserName
| The owner of the Blaze service and Blaze service logs.
When the Hadoop cluster uses Kerberos authentication, the default user is the Data Integration Service SPN user. When the Hadoop cluster does not use Kerberos authentication and the Blaze user is not configured, the default user is the Data Integration Service user.
|
blazeStagingDirectory
| The HDFS file path of the directory that the Blaze engine uses to store temporary files. Verify that the directory exists. The YARN user, Blaze engine user, and mapping impersonation user must have write permission on this directory.
Default is
/blaze/workdir . If you clear this property, the staging files are written to the Hadoop staging directory
/tmp/blaze_<user name> .
|
clusterConfigId
| The cluster configuration ID associated with the Hadoop cluster. You must enter a configuration ID to set up a Hadoop connection.
|
hiveStagingDatabaseName
| Namespace for Hive staging tables. Use the name
default for tables that do not have a specified database name.
|
engineType
| Execution engine to run HiveServer2 tasks on the Spark engine. Default is MRv2.
You can choose MRv2 or Tez according to the engine type that the Hadoop distribution uses:
|
environmentSQL
| SQL commands to set the Hadoop environment. The Data Integration Service executes the environment SQL at the beginning of each Hive script generated in a Hive execution plan.
The following rules and guidelines apply to the usage of environment SQL:
|
hadoopExecEnvExecutionParameterList
| Custom properties that are unique to the Hadoop connection.
Use the following format:
<property1>=<value>
To specify multiple properties use
&: as the property separator.
If more than one Hadoop connection is associated with the same cluster configuration, you can override configuration set property values.
Use Informatica custom properties only at the request of Informatica Global Customer Support.
|
hadoopRejDir
| The remote directory where the Data Integration Service moves reject files when you run mappings.
Enable the reject directory using rejDirOnHadoop.
|
impersonationUserName
| Required if the Hadoop cluster uses Kerberos authentication. Hadoop impersonation user. The user name that the Data Integration Service impersonates to run mappings in the Hadoop environment.
The Data Integration Service runs mappings based on the user that is configured. Refer the following order to determine which user the Data Integration Services uses to run mappings:
|
hiveWarehouseDirectoryOnHDFS
| Optional. The absolute HDFS file path of the default database for the warehouse that is local to the cluster.
If you do not configure the Hive warehouse directory, the Hive engine first tries to write to the directory specified in the cluster configuration property
hive.metastore.warehouse.dir . If the cluster configuration does not have the property, the Hive engine writes to the default directory
/user/hive/warehouse .
|
metastoreDatabaseDriver
| Driver class name for the JDBC data store. For example, the following class name specifies a MySQL driver:
com.mysql.jdbc.Driver
You can get the value for the Metastore Database Driver from hive-site.xml. The Metastore Database Driver appears as the following property in hive-site.xml:
|
metastoreDatabasePassword
| The password for the metastore user name.
You can get the value for the Metastore Database Password from hive-site.xml. The Metastore Database Password appears as the following property in hive-site.xml:
|
metastoreDatabaseURI
| The JDBC connection URI used to access the data store in a local metastore setup. Use the following connection URI:
jdbc:<datastore type>://<node name>:<port>/<database name>
where
For example, the following URI specifies a local metastore that uses MySQL as a data store:
jdbc:mysql://hostname23:3306/metastore
You can get the value for the Metastore Database URI from hive-site.xml. The Metastore Database URI appears as the following property in hive-site.xml:
|
metastoreDatabaseUserName
| The metastore database user name.
You can get the value for the Metastore Database User Name from hive-site.xml. The Metastore Database User Name appears as the following property in hive-site.xml:
|
metastoreMode
| Controls whether to connect to a remote metastore or a local metastore. By default, local is selected. For a local metastore, you must specify the Metastore Database URI, Metastore Database Driver, Username, and Password. For a remote metastore, you must specify only the
Remote Metastore URI .
You can get the value for the Metastore Execution Mode from hive-site.xml. The Metastore Execution Mode appears as the following property in hive-site.xml:
The
hive.metastore.local property is deprecated in hive-site.xml for Hive server versions 0.9 and above. If the
hive.metastore.local property does not exist but the
hive.metastore.uris property exists, and you know that the Hive server has started, you can set the connection to a remote metastore.
|
remoteMetastoreURI
| The metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you must specify the Thrift server details.
Use the following connection URI:
thrift://<hostname>:<port>
Where
For example, enter:
thrift://myhostname:9083/
You can get the value for the Remote Metastore URI from hive-site.xml. The Remote Metastore URI appears as the following property in hive-site.xml:
|
rejDirOnHadoop
| Enables hadoopRejDir. Used to specify a location to move reject files when you run mappings.
If enabled, the Data Integration Service moves mapping files to the HDFS location listed in hadoopRejDir.
By default, the Data Integration Service stores the mapping files based on the RejectDir system parameter.
|
sparkEventLogDir
| Optional. The HDFS file path of the directory that the Spark engine uses to log events.
|
sparkAdvancedProperties
| Advanced properties that are unique to the Spark engine.
To enter multiple properties, separate each name-value pair with the following text:
&: .
Use Informatica custom properties only at the request of Informatica Global Customer Support.
|
sparkStagingDirectory
| The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs. The YARN user, Data Integration Service user, and mapping impersonation user must have write permission on this directory.
/tmp/spark_<user name> .
|
sparkYarnQueueName
| The YARN scheduler queue name used by the Spark engine that specifies available resources on a cluster. The name is case sensitive.
|
stgDataCompressionCodecClass
| Codec class name that enables data compression and improves performance on temporary staging tables. The codec class name corresponds to the code type.
|
stgDataCompressionCodecType
| Hadoop compression library for a compression codec class name.
You can choose None, Zlib, Gzip, Snappy, Bz2, LZO, or Custom.
Default is None.
|