Test Data Management
- Test Data Management 10.5.3
- All Products
Property
| Description
|
---|---|
Name
| Required. Name of the connection. The name is not case sensitive and must be unique within the domain. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
|
ID
| String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
|
Connection Type
| Required. The connection type. Select Hadoop.
|
Description
| The description of the connection. The description cannot exceed 4000 characters.
|
Use Kerberos Authentication
| Enables Kerberos authentication for Hadoop connections.
|
Property
| Description
|
---|---|
Cluster Configuration
| The name of the cluster configuration object associated with the Hadoop environment.
|
Cloud Provisioning Connection
| Name of the cloud provisioning configuration associated with a cloud platform such as Amazon AWS or Microsoft Azure.
Required if you do not configure the Cluster Configuration.
|
Cluster Environment Variable
| Environment variables used in the cluster.
Specify any custom environment variables in the Hadoop connection. During runtime, the specified environment variables are combined with the default environment variables based on the cluster configuration associated with the Hadoop connection.
For example, you can specify ORACLE_HOME, ODBCHOME, or DB2_HOME.
|
Cluster Library Path
| The path for shared libraries on the cluster.
The $DEFAULT_CLUSTER_LIBRARY_PATH variable contains a list of default directories.
|
Cluster Class Path
| The classpath to access the Hadoop jar files and the required libraries.
The $DEFAULT_CLUSTER_CLASSPATH variable contains a list of paths to the default jar files and libraries.
|
Cluster Executable Path
| The classpath to access the Hadoop jar files and the required libraries.
The $DEFAULT_CLUSTER_CLASSPATH variable contains a list of paths to the default jar files and libraries.
|
Impersonation User Name
| Required if the Hadoop cluster uses Kerberos authentication. Hadoop impersonation user. The user name that the Data Integration Service impersonates to run mappings in the Hadoop environment.
The Data Integration Service runs mappings based on the user that is configured. Refer to the following order to determine which user the Data Integration Services uses to run mappings:
|
Temporary Table Compression Codec
| Hadoop compression library for a compression codec class name.
The Spark engine does not support compression settings for temporary tables. When you run mappings on the Spark engine, the Spark engine stores temporary tables in an uncompressed file format.
|
Codec Class Name
| Codec class name that enables data compression and optimizes performance on temporary staging tables.
|
Hive Staging Database Name
| Namespace for Hive staging tables. Use the name default for tables that do not have a specified database name.
If you do not configure a namespace, the Data Integration Service uses the Hive database name in the Hive target connection to create staging tables.
|
Hadoop Engine Custom Properties
| Custom properties that are unique to the Hadoop connection. You can specify multiple properties.
Click the
Add button to add the required number of rows. Enter the property name in the
Name field and the value in the Value field.
If more than one Hadoop connection is associated with the same cluster configuration, you can override configuration set property values.
Use Informatica custom properties only at the request of Informatica Global Customer Support.
|
Write Reject Files to Hadoop
| If you use the Blaze engine to run mappings, select the check box to specify a location to move reject files. If checked, the Data Integration Service moves the reject files to the HDFS location listed in the property, Reject File Directory.
By default, the Data Integration Service stores the reject files based on the RejectDir system parameter.
|
Reject Files Directory
| The directory for Hadoop mapping files on HDFS when you run mappings.
|
Property
| Description
|
---|---|
Environment SQL
| SQL commands to set the Hadoop environment. The Data Integration Service executes the environment SQL at the beginning of each Hive script generated in a Hive execution plan.
The following rules and guidelines apply to the usage of environment SQL:
|
Hive Warehouse Directory on HDFS
| Required. The absolute HDFS file path of the default database for the warehouse that is local to the cluster.
If you do not configure the Hive warehouse directory, the Hive engine first tries to write to the directory specified in the cluster configuration property
hive.metastore.warehouse.dir . If the cluster configuration does not have the property, the Hive engine writes to the default directory
/user/hive/warehouse .
|
Hive JDBC Connection String
| The JDBC URI to connect to the Hive server.
To connect to HiveServer, specify the connection string in the following format:
jdbc:hive2://<hostname>:<port>/<db>
Where
|
Engine Type
| The engine that the Hadoop environment uses to run a mapping on the Hadoop cluster. You can choose MRv2 or Tez. You can select Tez if it is configured for the Hadoop cluster. Default is MRv2.
|
Hive Engine Custom Properties
| Custom properties that are unique to the Hive connection.
You can specify multiple properties.
Click the
Add button to add the required number of rows. Enter the property name in the
Name field and the value in the Value field.
If more than one Hive connection is associated with the same cluster configuration, you can override configuration set property values.
Use Informatica custom properties only at the request of Informatica Global Customer Support.
|
Property
| Description
|
---|---|
Blaze Staging Directory
| The HDFS file path of the directory that the Blaze engine uses to store temporary files. Verify that the directory exists. The YARN user, Blaze engine user, and mapping impersonation user must have write permission on this directory.
Default is
/blaze/workdir . If you clear this property, the staging files are written to the Hadoop staging directory
/tmp/blaze_<user name> .
|
Blaze Service User Name
| The operating system profile user name for the Blaze engine.
|
Minimum Port
| The minimum value for the port number range for the Blaze engine. Default is 12300.
|
Maximum Port
| The maximum value for the port number range for the Blaze engine. Default is 12600.
|
YARN Queue Name
| The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster.
|
Blaze Job Monitor Address
| The host name and port number for the Blaze Job Monitor. Use the following format:
<hostname>:<port>
Where
For example, enter:
myhostname:9080
|
Blaze Yarn Node Label Expression
| Node label that determines the node on the Hadoop cluster where the Blaze engine runs. If you do not specify a node label, the Blaze engine runs on the nodes in the default partition.
If the Hadoop cluster supports logical operators for node labels, you can specify a list of node labels. To list the node labels, use the operators
&& (AND),
|| (OR), and
! (NOT).
|
Blaze Service Custom Properties
| Custom properties that are unique to the Blaze engine.
Click the
Add button to add the required number of rows. Enter the property name in the
Name field and the value in the Value field.
|
Property
| Description
|
---|---|
Spark Staging Directory
| The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs. The YARN user, Data Integration Service user, and mapping impersonation user must have write permission on this directory.
By default, the temporary files are written to the Hadoop staging directory
/tmp/spark_<user name> .
|
Spark Event Log Directory
| Optional. The HDFS file path of the directory that the Spark engine uses to log events.
|
YARN Queue Name
| The YARN scheduler queue name used by the Spark engine that specifies available resources on a cluster. The name is case sensitive.
|
Spark Execution Parameters
| An optional list of configuration parameters to apply to the Spark engine. You can change the default Spark configuration properties values, such as
spark.executor.memory or
spark.driver.cores .
Click the
Add button to add the required number of rows. Enter the property name in the
Name field and the value in the Value field.
You might notice a decrease in performance on the Spark engine.
To optimize performance, configure the following Spark engine configuration properties:
|