Data Engineering Integration
All Products
Property
| Description
|
---|---|
Name
| The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? / |
ID
| String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
|
Description
| The description of the connection. The description cannot exceed 4000 characters.
|
Location
| The domain where you want to create the connection. Not valid for the Analyst tool.
|
Type
| The connection type. Select Hive.
|
Connection Modes
| Hive connection mode. Select at least one of the following options:
|
User Name
| User name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster. The user name depends on the JDBC connection string that you specify in the Metadata Connection String or Data Access Connection String for the native environment.
If the Hadoop cluster runs Hortonworks HDP, you must provide a user name. If you use Tez to run mappings, you must provide the user account for the Data Integration Service. If you do not use Tez to run mappings, you can use an impersonation user account.
If the Hadoop cluster uses Kerberos authentication, the principal name for the JDBC connection string and the user name must be the same. Otherwise, the user name depends on the behavior of the JDBC driver. With Hive JDBC driver, you can specify a user name in many ways and the user name can become a part of the JDBC URL.
If the Hadoop cluster does not use Kerberos authentication, the user name depends on the behavior of the JDBC driver.
If you do not specify a user name, the Hadoop cluster authenticates jobs based on the following criteria:
|
Common Attributes to Both the Modes: Environment SQL
| SQL commands to set the Hadoop environment. In native environment type, the Data Integration Service executes the environment SQL each time it creates a connection to a Hive metastore. If you use the Hive connection to run profiles in the Hadoop cluster, the Data Integration Service executes the environment SQL at the beginning of each Hive session.
The following rules and guidelines apply to the usage of environment SQL in both connection modes:
If you use the Hive connection to run profiles in the Hadoop cluster, the Data Integration service executes only the environment SQL of the Hive connection. If the Hive sources and targets are on different clusters, the Data Integration Service does not execute the different environment SQL commands for the connections of the Hive source or target.
|
Property
| Description
|
---|---|
Metadata Connection String
| The JDBC connection URI used to access the metadata from the Hadoop server.
You can use PowerExchange for Hive to communicate with a HiveServer service or HiveServer2 service.
To connect to HiveServer, specify the connection string in the following format:
jdbc:hive2://<hostname>:<port>/<db> Where
To connect to HiveServer 2, use the connection string format that Apache Hive implements for that specific Hadoop Distribution. For more information about Apache Hive connection string formats, see the Apache Hive documentation.
|
Bypass Hive JDBC Server
| JDBC driver mode. Select the check box to use the embedded JDBC driver mode.
To use the JDBC embedded mode, perform the following tasks:
If you choose the non-embedded mode, you must configure the Data Access Connection String.
Informatica recommends that you use the JDBC embedded mode.
|
Data Access Connection String
| The connection string to access data from the Hadoop data store.
To connect to HiveServer, specify the non-embedded JDBC mode connection string in the following format:
jdbc:hive2://<hostname>:<port>/<db> Where
To connect to HiveServer 2, use the connection string format that Apache Hive implements for the specific Hadoop Distribution. For more information about Apache Hive connection string formats, see the Apache Hive documentation.
|
Property
| Description
|
---|---|
Database Name
| Namespace for tables. Use the name
default for tables that do not have a specified database name.
|
Default FS URI
| The URI to access the default Hadoop Distributed File System.
Use the following connection URI:
hdfs://<node name>:<port> Where
If the Hadoop cluster runs MapR, use the following URI to access the MapR File system:
maprfs:/// .
|
JobTracker/Yarn Resource Manager URI
| The service within Hadoop that submits the MapReduce tasks to specific nodes in the cluster.
Use the following format:
<hostname>:<port> Where
If the cluster uses MapR with YARN, use the value specified in the
yarn.resourcemanager.address property in yarn-site.xml. You can find
yarn-site.xml in the following directory on the NameNode of the cluster:
/opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop .
MapR with MapReduce 1 supports a highly available JobTracker. If you are using MapR distribution, define the JobTracker URI in the following format:
maprfs:/// |
Hive Warehouse Directory on HDFS
| The absolute HDFS file path of the default database for the warehouse that is local to the cluster. For example, the following file path specifies a local warehouse:
/user/hive/warehouse For Cloudera CDH, if the Metastore Execution Mode is remote, then the file path must match the file path specified by the Hive Metastore Service on the Hadoop cluster.
For MapR, use the value specified for the
hive.metastore.warehouse.dir property in
hive-site.xml . You can find
hive-site.xml in the following directory on the node that runs HiveServer2:
/opt/mapr/hive/hive-0.13/conf .
|
Advanced Hive/Hadoop Properties
| Configures or overrides Hive or Hadoop cluster properties in hive-site.xml on the machine on which the Data Integration Service runs. You can specify multiple properties.
Select
Edit to specify the name and value for the property. The property appears in the following format:
<property1>=<value> Where
When you specify multiple properties,
&: appears as the property separator.
The maximum length for the format is 1 MB.
If you enter a required property for a Hive connection, it overrides the property that you configure in the Advanced Hive/Hadoop Properties.
The Data Integration Service adds or sets these properties for each map-reduce job. You can verify these properties in the JobConf of each mapper and reducer job. Access the JobConf of each job from the Jobtracker URL under each map-reduce job.
The Data Integration Service writes messages for these properties to the Data Integration Service logs. The Data Integration Service must have the log tracing level set to log each row or have the log tracing level set to verbose initialization tracing.
For example, specify the following properties to control and limit the number of reducers to run a mapping job:
mapred.reduce.tasks=2&:hive.exec.reducers.max=10 |
Temporary Table Compression Codec
| Hadoop compression library for a compression codec class name.
|
Codec Class Name
| Codec class name that enables data compression and improves performance on temporary staging tables.
|
Metastore Execution Mode
| Controls whether to connect to a remote metastore or a local metastore. By default, local is selected. For a local metastore, you must specify the Metastore Database URI, Driver, Username, and Password. For a remote metastore, you must specify only the
Remote Metastore URI .
|
Metastore Database URI
| The JDBC connection URI used to access the data store in a local metastore setup. Use the following connection URI:
jdbc:<datastore type>://<node name>:<port>/<database name> where
For example, the following URI specifies a local metastore that uses MySQL as a data store:
jdbc:mysql://hostname23:3306/metastore For MapR, use the value specified for the
javax.jdo.option.ConnectionUR L property in
hive-site.xml . You can find hive-site.xml in the following directory on the node where HiveServer 2 runs: /opt/mapr/hive/hive-0.13/conf.
|
Metastore Database Driver
| Driver class name for the JDBC data store. For example, the following class name specifies a MySQL driver:
com.mysql.jdbc.Driver For MapR, use the value specified for the
javax.jdo.option.ConnectionDriverName property in
hive-site.xml . You can find
hive-site.xml in the following directory on the node where HiveServer 2 runs:
/opt/mapr/hive/hive-0.13/conf .
|
Metastore Database Username
| The metastore database user name.
For MapR, use the value specified for the
javax.jdo.option.ConnectionUserName property in
hive-site.xml . You can find
hive-site.xml in the following directory on the node where HiveServer 2 runs:
/opt/mapr/hive/hive-0.13/conf .
|
Metastore Database Password
| The password for the metastore user name.
For MapR, use the value specified for the
javax.jdo.option.ConnectionPassword property in
hive-site.xml . You can find
hive-site.xml in the following directory on the node where HiveServer 2 runs:
/opt/mapr/hive/hive-0.13/conf .
|
Remote Metastore URI
| The metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you must specify the Thrift server details.
Use the following connection URI:
thrift://<hostname>:<port> Where
For MapR, use the value specified for the
hive.metastore.uris property in
hive-site.xml . You can find
hive-site.xml in the following directory on the node where HiveServer 2 runs:
/opt/mapr/hive/hive-0.13/conf .
|
Updated July 03, 2018