Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in a Hadoop Environment
  5. Mappings in the Native Environment
  6. Profiles
  7. Native Environment Optimization
  8. POWERCENTERHELP
  9. Data Type Reference

Hadoop Connection Properties

Hadoop Connection Properties

Use the Hadoop connection to run mappings on a Hadoop cluster. A Hadoop connection is a cluster type connection. You can create and manage a Hadoop connection in the Administrator tool or the Developer tool. You can use infacmd to create a Hadoop connection. Hadoop connection properties are case sensitive unless otherwise noted.

General Properties

The following table describes the general connection properties for the Hadoop connection:
Property
Description
Name
The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
ID
String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
Description
The description of the connection. Enter a string that you can use to identify the connection. The description cannot exceed 4,000 characters.
Location
The domain where you want to create the connection. Select the domain name.
Type
The connection type. Select Hadoop.

Hadoop Cluster Properties

The following table describes the connection properties that you configure for the Hadoop cluster:
Property
Description
Resource Manager Address
The service within Hadoop that submits requests for resources or spawns YARN applications.
Use the following format:
<hostname>:<port>
Where
  • <hostname> is the host name or IP address of the Yarn resource manager.
  • <port> is the port on which the Yarn resource manager listens for remote procedure calls (RPC).
For example, enter:
myhostame:8032
You can also get the Resource Manager Address property from yarn-site.xml located in the following directory on the Hadoop cluster:
/etc/hadoop/conf/
The Resource Manager Address appears as the following property in yarn-site.xml:
<property> <name>yarn.resourcemanager.address</name> <value>hostname:port</value> <description>The address of the applications manager interface in the Resource Manager.</description> </property>
Optionally, if the
yarn.resourcemanager.address
property is not configured in yarn-site.xml, you can find the host name from the
yarn.resourcemanager.hostname
or
yarn.resourcemanager.scheduler.address
properties in yarn-site.xml. You can then configure the Resource Manager Address in the Hadoop connection with the following value:
hostname:8032
Default File System URI
The URI to access the default Hadoop Distributed File System.
Use the following connection URI:
hdfs://<node name>:<port>
Where
  • <node name> is the host name or IP address of the NameNode.
  • <port> is the port on which the NameNode listens for remote procedure calls (RPC).
For example, enter:
hdfs://myhostname:8020/
You can also get the Default File System URI property from core-site.xml located in the following directory on the Hadoop cluster:
/etc/hadoop/conf/
Use the value from the
fs.defaultFS
property found in core-site.xml.
For example, use the following value:
<property> <name>fs.defaultFS</name> <value>hdfs://localhost:8020</value> </property>
If the Hadoop cluster runs MapR, use the following URI to access the MapR File system:
maprfs:///
.

Common Properties

The following table describes the common connection properties that you configure for the Hadoop connection:
Property
Description
Impersonation User Name
User name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster.
If the Hadoop cluster uses Kerberos authentication, the principal name for the JDBC connection string and the user name must be the same.
You must use user impersonation for the Hadoop connection if the Hadoop cluster uses Kerberos authentication.
If the Hadoop cluster does not use Kerberos authentication, the user name depends on the behavior of the JDBC driver.
If you do not specify a user name, the Hadoop cluster authenticates jobs based on the operating system profile user name of the machine that runs the Data Integration Service.
Temporary Table Compression Codec
Hadoop compression library for a compression codec class name.
Codec Class Name
Codec class name that enables data compression and improves performance on temporary staging tables.
Hadoop Connection Custom Properties
Custom properties that are unique to the Hadoop connection.
You can specify multiple properties.
Use the following format:
<property1>=<value>
Where
  • <property1> is a Blaze, Hive, or Hadoop property.
  • <value> is the value of the Hive or Hadoop property.
To specify multiple properties use
&:
as the property separator.
Use custom properties only at the request of Informatica Global Customer Support.

Hive Pushdown Configuration

The following table describes the connection properties that you configure to push mapping logic to the Hadoop cluster:
Property
Description
Environment SQL
SQL commands to set the Hadoop environment. The Data Integration Service executes the environment SQL at the beginning of each Hive script generated in a Hive execution plan.
The following rules and guidelines apply to the usage of environment SQL:
  • Use the environment SQL to specify Hive queries.
  • Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. The path must be the fully qualified path to the JAR files used for user-defined functions. Set the parameter hive.aux.jars.path with all the entries in infapdo.aux.jars.path and the path to the JAR files for user-defined functions.
  • You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries.
Database Name
Namespace for tables. Use the name
default
for tables that do not have a specified database name.
Hive Warehouse Directory on HDFS
The absolute HDFS file path of the default database for the warehouse that is local to the cluster. For example, the following file path specifies a local warehouse:
/user/hive/warehouse
For Cloudera CDH, if the Metastore Execution Mode is remote, then the file path must match the file path specified by the Hive Metastore Service on the Hadoop cluster.
You can get the value for the Hive Warehouse Directory on HDFS from the
hive.metastore.warehouse.dir
property in hive-site.xml located in the following directory on the Hadoop cluster:
/etc/hadoop/conf/
For example, use the following value:
<property> <name>hive.metastore.warehouse.dir</name> <value>/usr/hive/warehouse </value> <description>location of the warehouse directory</description> </property>
For MapR,
hive-site.xml
is located in the following direcetory:
/opt/mapr/hive/<hive version>/conf
.

Hive Configuration

You can use the values for Hive configuration properties from hive-site.xml or mapred-site.xml located in the following directory on the Hadoop cluster:
/etc/hadoop/conf/
The following table describes the connection properties that you configure for the Hive engine:
Property
Description
Metastore Execution Mode
Controls whether to connect to a remote metastore or a local metastore. By default, local is selected. For a local metastore, you must specify the Metastore Database URI, Metastore Database Driver, Username, and Password. For a remote metastore, you must specify only the
Remote Metastore URI
.
You can get the value for the Metastore Execution Mode from hive-site.xml. The Metastore Execution Mode appears as the following property in hive-site.xml:
<property> <name>hive.metastore.local</name> <value>true</true> </property>
The
hive.metastore.local
property is deprecated in hive-site.xml for Hive server versions 0.9 and above. If the
hive.metastore.local
property does not exist but the
hive.metastore.uris
property exists, and you know that the Hive server has started, you can set the connection to a remote metastore.
Metastore Database URI
The JDBC connection URI used to access the data store in a local metastore setup. Use the following connection URI:
jdbc:<datastore type>://<node name>:<port>/<database name>
where
  • <node name> is the host name or IP address of the data store.
  • <data store type> is the type of the data store.
  • <port> is the port on which the data store listens for remote procedure calls (RPC).
  • <database name> is the name of the database.
For example, the following URI specifies a local metastore that uses MySQL as a data store:
jdbc:mysql://hostname23:3306/metastore
You can get the value for the Metastore Database URI from hive-site.xml. The Metastore Database URI appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://MYHOST/metastore</value> </property>
Metastore Database Driver
Driver class name for the JDBC data store. For example, the following class name specifies a MySQL driver:
com.mysql.jdbc.Driver
You can get the value for the Metastore Database Driver from hive-site.xml. The Metastore Database Driver appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property>
Metastore Database User Name
The metastore database user name.
You can get the value for the Metastore Database User Name from hive-site.xml. The Metastore Database User Name appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> </property>
Metastore Database Password
The password for the metastore user name.
You can get the value for the Metastore Database Password from hive-site.xml. The Metastore Database Password appears as the following property in hive-site.xml:
<property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> </property>
Remote Metastore URI
The metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you must specify the Thrift server details.
Use the following connection URI:
thrift://<hostname>:<port>
Where
  • <hostname> is name or IP address of the Thrift metastore server.
  • <port> is the port on which the Thrift server is listening.
For example, enter:
thrift://myhostname:9083/
You can get the value for the Remote Metastore URI from hive-site.xml. The Remote Metastore URI appears as the following property in hive-site.xml:
<property> <name>hive.metastore.uris</name> <value>thrift://<n.n.n.n>:9083</value> <description> IP address or fully-qualified domain name and port of the metastore host</description> </property>
Engine Type
The engine that the Hadoop environment uses to run a mapping on the Hadoop cluster. Select a value from the drop down list.
For example select:
MRv2
To set the engine type in the Hadoop connection, you must get the value for the
mapreduce.framework.name
property from mapred-site.xml located in the following directory on the Hadoop cluster:
/etc/hadoop/conf/
If the value for
mapreduce.framework.name
is
classic
, select
mrv1
as the engine type in the Hadoop connection.
If the value for
mapreduce.framework.name
is
yarn
, you can select the
mrv2
or
tez
as the engine type in the Hadoop connection. Do not select Tez if Tez is not configured for the Hadoop cluster.
You can also set the value for the engine type in hive-site.xml. The engine type appears as the following property in hive-site.xml:
<property> <name>hive.execution.engine</name> <value>tez</value> <description>Chooses execution engine. Options are: mr (MapReduce, default) or tez (Hadoop 2 only)</description> </property>
Job Monitoring URL
The URL for the MapReduce JobHistory server. You can use the URL for the JobTracker URI if you use MapReduce version 1.
Use the following format:
<hostname>:<port>
Where
  • <hostname> is the host name or IP address of the JobHistory server.
  • <port> is the port on which the JobHistory server listens for remote procedure calls (RPC).
For example, enter:
myhostname:8021
You can get the value for the Job Monitoring URL from mapred-site.xml. The Job Monitoring URL appears as the following property in mapred-site.xml:
<property> <name>mapred.job.tracker</name> <value>myhostname:8021 </value> <description>The host and port that the MapReduce job tracker runs at.</description> </property>

Blaze Service

The following table describes the connection properties that you configure for the Blaze engine:
Property
Description
Temporary Working Directory on HDFS
The HDFS file path of the directory that the Blaze engine uses to store temporary files. Verify that the directory exists. The YARN user, Blaze engine user, and mapping impersonation user must have write permission on this directory.
For example, enter:
/blaze/workdir
Blaze Service User Name
The operating system profile user name for the Blaze engine.
Minimum Port
The minimum value for the port number range for the Blaze engine.
For example, enter:
12300
Maximum Port
The maximum value for the port number range for the Blaze engine.
For example, enter:
12600
Yarn Queue Name
The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster. The name is case sensitive.
Blaze Service Custom Properties
Custom properties that are unique to the Blaze engine.
You can specify multiple properties.
Use the following format:
<property1>=<value>
Where
  • <property1> is a Blaze engine optimization property.
  • <value> is the value of the Blaze engine optimization property.
To specify multiple properties use
&:
as the property separator.
Use custom properties only at the request of Informatica Global Customer Support.


Updated July 03, 2018