Table of Contents

Search

  1. Preface
  2. Introduction to TDM Administration
  3. User and Role Administration
  4. Security Administration
  5. System Preferences
  6. TDM Server Administration
  7. Connections
  8. Dictionaries
  9. User-Defined Datatypes

Hadoop Connections

Hadoop Connections

A Hadoop connection is a cluster type connection. In the Administrator tool, you must create a cluster configuration object for the Hadoop clusters. Create and manage Hadoop connections from Test Data Manager. When you select the Hadoop connection in a Hadoop plan, TDM uses the Data Integration Service to run mappings on a Hadoop cluster.
The following table describes Hadoop connection properties:
Property
Description
Name
Required. Name of the connection. The name is not case sensitive and must be unique within the domain. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
Connection Type
Required. The connection type. Select Hadoop.
Description
The description of the connection. The description cannot exceed 4000 characters.
Use Kerberos Authentication
Enables Kerberos authentication for Hadoop connections.
Owner
The owner of the connection. Default is the user who creates the connection. You can change the owner of the connection.

Hadoop Properties

The following table describes the cluster and connection properties that you configure for Hadoop:
Property
Description
Cluster Configuration
The name of the cluster configuration associated with the Hadoop environment.
Impersonation User Name
Required if the Hadoop cluster uses Kerberos authentication. Hadoop impersonation user. The user name that the Data Integration Service impersonates to run mappings in the Hadoop environment.
The Data Integration Service runs mappings based on the user that is configured. Refer the following order to determine which user the Data Integration Services uses to run mappings:
  1. Operating system profile user. The mapping runs with the operating system profile user if the profile user is configured. If there is no operating system profile user, the mapping runs with the Hadoop impersonation user.
  2. Hadoop impersonation user. The mapping runs with the Hadoop impersonation user if the operating system profile user is not configured. If the Hadoop impersonation user is not configured, the Data Integration Service runs mappings with the Data Integration Service user.
  3. Informatica services user. The mapping runs with the operating user that starts the Informatica daemon if the operating system profile user and the Hadoop impersonation user are not configured.
Temporary Table Compression Codec
Hadoop compression library for a compression codec class name.
Codec Class Name
Codec class name that enables data compression and improves performance on temporary staging tables.
Hadoop Engine Custom Properties
Custom properties that are unique to the Hadoop connection.
You can specify multiple properties.
Use the following format:
<property1>=<value>
To specify multiple properties use
&:
as the property separator.
If more than one Hadoop connection is associated with the same cluster configuration, you can override configuration set property values.
Use Informatica custom properties only at the request of Informatica Global Customer Support.

Hive Configuration

You can use the values for Hive configuration properties from hive-site.xml or mapred-site.xml located in the following directory on the Hadoop cluster:
/etc/hadoop/conf/
.
The following table describes the connection properties that you configure to push mapping logic to the Hadoop cluster:
Property
Description
Environment SQL
SQL commands to set the Hadoop environment. The Data Integration Service executes the environment SQL at the beginning of each Hive script generated in a Hive execution plan.
The following rules and guidelines apply to the usage of environment SQL:
  • Use the environment SQL to specify Hive queries.
  • Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. The path must be the fully qualified path to the JAR files used for user-defined functions. Set the parameter hive.aux.jars.path with all the entries in infapdo.aux.jars.path and the path to the JAR files for user-defined functions.
  • You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries.
  • If you use multiple values for the environment SQL, ensure that there is no space between the values.
Hive Warehouse Directory on HDFS
Required. The absolute HDFS file path of the default database for the warehouse that is local to the cluster.
If you do not configure the Hive warehouse directory, the Hive engine first tries to write to the directory specified in the cluster configuration property
hive.metastore.warehouse.dir
. If the cluster configuration does not have the property, the Hive engine writes to the default directory
/user/hive/warehouse
.
Hive JDBC Connection String
The JDBC URI to connect to the Hive server.
To connect to HiveServer, specify the connection string in the following format:
jdbc:hive2://<hostname>:<port>/<db>
Where
  • <hostname> is name or IP address of the machine on which HiveServer2 runs.
  • <port> is the port number on which HiveServer2 listens.
  • <db> is the database name to which you want to connect. If you do not provide the database name, the Data Integration Service uses the default database details.
Engine Type
The engine that the Hadoop environment uses to run a mapping on the Hadoop cluster. You can choose MRv2 or Tez. You can select Tez if it is configured for the Hadoop cluster. Default is MRv2.

Blaze Engine

The following table describes the connection properties that you configure for the Blaze engine:
Property
Description
Blaze Staging Directory
The HDFS file path of the directory that the Blaze engine uses to store temporary files. Verify that the directory exists. The YARN user, Blaze engine user, and mapping impersonation user must have write permission on this directory.
Default is
/blaze/workdir
. If you clear this property, the staging files are written to the Hadoop staging directory
/tmp/blaze_<user name>
.
Blaze Service User Name
The operating system profile user name for the Blaze engine.
Minimum Port
The minimum value for the port number range for the Blaze engine. Default is 12300.
Maximum Port
The maximum value for the port number range for the Blaze engine. Default is 12600.
YARN Queue Name
The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster.
Blaze Service Custom Properties
Custom properties that are unique to the Blaze engine.
To enter multiple properties, separate each name-value pair with the following text:
&:
.
Use Informatica custom properties only at the request of Informatica Global Customer Support.

Spark Engine

The following table describes the connection properties that you configure for the Spark engine:
Property
Description
Spark Staging Directory
The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs. The YARN user, Data Integration Service user, and mapping impersonation user must have write permission on this directory.
By default, the temporary files are written to the Hadoop staging directory
/tmp/spark_<user name>
.
Spark Event Log Directory
Optional. The HDFS file path of the directory that the Spark engine uses to log events.
YARN Queue Name
The YARN scheduler queue name used by the Spark engine that specifies available resources on a cluster. The name is case sensitive.
Spark Execution Parameters
An optional list of configuration parameters to apply to the Spark engine. You can change the default Spark configuration properties values, such as
spark.executor.memory
or
spark.driver.cores
.
Use the following format:
<property1>=<value>
To enter multiple properties, separate each name-value pair with the following text:
&: