Data Engineering Integration
- Data Engineering Integration 10.5.1
- All Products
Property
| Description
|
---|---|
Name
| The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
|
ID
| String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
|
Description
| The description of the connection. The description cannot exceed 4000 characters.
|
Location
| The domain where you want to create the connection. Not valid for the Analyst tool.
|
Type
| The connection type. Select Hive.
|
LDAP username
| LDAP user name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster. The user name depends on the JDBC connection string that you specify in the Metadata Connection String or Data Access Connection String for the native environment.
If the Hadoop cluster uses Kerberos authentication, the principal name for the JDBC connection string and the user name must be the same. Otherwise, the user name depends on the behavior of the JDBC driver. With Hive JDBC driver, you can specify a user name in many ways and the user name can become a part of the JDBC URL.
If the Hadoop cluster does not use Kerberos authentication, the user name depends on the behavior of the JDBC driver.
If you do not specify a user name, the Hadoop cluster authenticates jobs based on the following criteria:
|
Password
| Password for the LDAP username.
|
Environment SQL
| SQL commands to set the Hadoop environment. In native environment type, the Data Integration Service executes the environment SQL each time it creates a connection to a Hive metastore. If you use the Hive connection to run profiles on a Hadoop cluster, the Data Integration Service executes the environment SQL at the beginning of each Hive session.
The following rules and guidelines apply to the usage of environment SQL in both connection modes:
|
SQL Identifier Character
| The type of character used to identify special characters and reserved SQL keywords, such as WHERE. The Data Integration Service places the selected character around special characters and reserved SQL keywords. The Data Integration Service also uses this character for the
Support mixed-case identifiers property.
|
Property
| Description
|
---|---|
JDBC Driver Class Name
| Name of the Hive JDBC driver class. If you leave this option blank, the Developer tool uses the default Apache Hive JDBC driver shipped with the distribution. If the default Apache Hive JDBC driver does not fit your requirements, you can override the Apache Hive JDBC driver with a third-party Hive JDBC driver by specifying the driver class name.
|
Metadata Connection String
| The JDBC connection URI used to access the metadata from the Hadoop server.
You can use PowerExchange for Hive to communicate with a HiveServer service or HiveServer2 service. To connect to HiveServer, specify the connection string in the following format:
Where
To connect to HiveServer2, use the connection string format that Apache Hive implements for that specific Hadoop Distribution. For more information about Apache Hive connection string formats, see the Apache Hive documentation.
For user impersonation, you must add
hive.server2.proxy.user=<xyz> to the JDBC connection URI. If you do not configure user impersonation, the current user's credentials are used connect to the HiveServer2.
If the Hadoop cluster uses SSL or TLS authentication, you must add
ssl=true to the JDBC connection URI. For example:
jdbc:hive2://<hostname>:<port>/<db>;ssl=true
If you use self-signed certificate for SSL or TLS authentication, ensure that the certificate file is available on the client machine and the Data Integration Service machine. For more information, see the Data Engineering Integration Guide .
|
Bypass Hive JDBC Server
| JDBC driver mode. Select the check box to use the embedded JDBC driver mode.
To use the JDBC embedded mode, perform the following tasks:
If you choose the non-embedded mode, you must configure the Data Access Connection String.
Informatica recommends that you use the JDBC embedded mode.
|
Fine Grained Authorization
| When you select the option to observe fine grained authorization in a Hive source, the mapping observes the following:
If you do not select the option, the Blaze and Spark engines ignore the restrictions and masking rules, and results include restricted or sensitive data.
|
Data Access Connection String
| The connection string to access data from the Hadoop data store. To connect to HiveServer, specify the non-embedded JDBC mode connection string in the following format:
Where
To connect to HiveServer2, use the connection string format that Apache Hive implements for the specific Hadoop Distribution. For more information about Apache Hive connection string formats, see the Apache Hive documentation.
For user impersonation, you must add
hive.server2.proxy.user=<xyz> to the JDBC connection URI. If you do not configure user impersonation, the current user's credentials are used connect to the HiveServer2.
If the Hadoop cluster uses SSL or TLS authentication, you must add
ssl=true to the JDBC connection URI. For example:
jdbc:hive2://<hostname>:<port>/<db>;ssl=true
If you use self-signed certificate for SSL or TLS authentication, ensure that the certificate file is available on the client machine and the Data Integration Service machine. For more information, see the
Data Engineering Integration Guide .
|
Hive Staging Directory on HDFS
| HDFS directory for Hive staging tables. You must grant execute permission to the Hadoop impersonation user and the mapping impersonation users.
This option is applicable and required when you write data to a Hive target in the native environment.
|
Hive Staging Database Name
| Namespace for Hive staging tables.
The Hive Staging Database Name is automatically updated from the Data Access Connection String. If you want to override the default name, you need to configure the Hive Staging Database Name in the Hive connection.
This option is applicable when you run a mapping in the native environment to write data to a Hive target.
If you run the mapping on the Blaze or Spark engine, you do not need to configure the Hive staging database name in the Hive connection. The Data Integration Service uses the value that you configure in the Hadoop connection.
|