Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Big Data Management User Guide

Big Data Management User Guide

Hive Connection Properties

Hive Connection Properties

Use the Hive connection to access Hive data. A Hive connection is a database type connection. You can create and manage a Hive connection in the Administrator tool, Analyst tool, or the Developer tool. Hive connection properties are case sensitive unless otherwise noted.
The order of the connection properties might vary depending on the tool where you view them.
The following table describes Hive connection properties:
Property
Description
Name
The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
ID
String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
Description
The description of the connection. The description cannot exceed 4000 characters.
Location
The domain where you want to create the connection. Not valid for the Analyst tool.
Type
The connection type. Select Hive.
User Name
User name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster. The user name depends on the JDBC connection string that you specify in the Metadata Connection String or Data Access Connection String for the native environment.
If the Hadoop cluster runs Hortonworks HDP, you must provide a user name. If you use Tez to run mappings, you must provide the user account for the Data Integration Service. If you do not use Tez to run mappings, you can use an impersonation user account.
If the Hadoop cluster uses Kerberos authentication, the principal name for the JDBC connection string and the user name must be the same. Otherwise, the user name depends on the behavior of the JDBC driver. With Hive JDBC driver, you can specify a user name in many ways and the user name can become a part of the JDBC URL.
If the Hadoop cluster does not use Kerberos authentication, the user name depends on the behavior of the JDBC driver.
If you do not specify a user name, the Hadoop cluster authenticates jobs based on the following criteria:
  • The Hadoop cluster does not use Kerberos authentication. It authenticates jobs based on the operating system profile user name of the machine that runs the Data Integration Service.
  • The Hadoop cluster uses Kerberos authentication. It authenticates jobs based on the SPN of the Data Integration Service. User Name will be ignored.
Password
Password for the user name.
Environment SQL
SQL commands to set the Hadoop environment. In native environment type, the Data Integration Service executes the environment SQL each time it creates a connection to a Hive metastore. If you use the Hive connection to run profiles on a Hadoop cluster, the Data Integration Service executes the environment SQL at the beginning of each Hive session.
The following rules and guidelines apply to the usage of environment SQL in both connection modes:
  • Use the environment SQL to specify Hive queries.
  • Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. If you use Hive user-defined functions, you must copy the .jar files to the following directory:
    <Informatica installation directory>/services/shared/hadoop/<Hadoop distribution name>/extras/hive-auxjars
  • You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries.
  • If you use multiple values for the Environment SQL property, ensure that there is no space between the values.
SQL Identifier Character
The type of character used to identify special characters and reserved SQL keywords, such as WHERE. The Data Integration Service places the selected character around special characters and reserved SQL keywords. The Data Integration Service also uses this character for the
Support mixed-case identifiers
property.

Properties to Access Hive as Source or Target

The following table describes the connection properties that you configure to access Hive as a source or target:
Property
Description
JDBC Driver Class Name
Name of the Hive JDBC driver class. If you leave this option blank, the Developer tool uses the default Apache Hive JDBC driver shipped with the distribution. If the default Apache Hive JDBC driver does not fit your requirements, you can override the Apache Hive JDBC driver with a third-party Hive JDBC driver by specifying the driver class name.
Metadata Connection String
The JDBC connection URI used to access the metadata from the Hadoop server.
You can use PowerExchange for Hive to communicate with a HiveServer service or HiveServer2 service.
To connect to HiveServer, specify the connection string in the following format:
jdbc:hive2://<hostname>:<port>/<db>
Where
  • <hostname> is name or IP address of the machine on which HiveServer2 runs.
  • <port> is the port number on which HiveServer2 listens.
  • <db> is the database name to which you want to connect. If you do not provide the database name, the Data Integration Service uses the default database details.
To connect to HiveServer 2, use the connection string format that Apache Hive implements for that specific Hadoop Distribution. For more information about Apache Hive connection string formats, see the Apache Hive documentation.
For user impersonation, you must add
hive.server2.proxy.user=<xyz>
to the JDBC connection URI. If you do not configure user impersonation, the current user's credentials are used connect to the HiveServer2.
If the Hadoop cluster uses SSL or TLS authentication, you must add
ssl=true
to the JDBC connection URI. For example:
jdbc:hive2://<hostname>:<port>/<db>;ssl=true
If you use self-signed certificate for SSL or TLS authentication, ensure that the certificate file is available on the client machine and the Data Integration Service machine. For more information, see the
Informatica Big Data Management Hadoop Integration Guide
.
Bypass Hive JDBC Server
JDBC driver mode. Select the check box to use the embedded JDBC driver mode.
To use the JDBC embedded mode, perform the following tasks:
  • Verify that Hive client and Informatica services are installed on the same machine.
  • Configure the Hive connection properties to run mappings on a Hadoop cluster.
If you choose the non-embedded mode, you must configure the Data Access Connection String.
Informatica recommends that you use the JDBC embedded mode.
Observe Fine Grained SQL Authorization
When you select the option to observe fine-grained SQL authorization in a Hive source, the mapping observes row and column-level restrictions on data access. If you do not select the option, the Blaze and Spark engines ignore the restrictions, and results include restricted data.
Applicable to Hadoop clusters where Sentry or Ranger security modes are enabled.
Data Access Connection String
The connection string to access data from the Hadoop data store.
To connect to HiveServer, specify the non-embedded JDBC mode connection string in the following format:
jdbc:hive2://<hostname>:<port>/<db>
Where
  • <hostname> is name or IP address of the machine on which HiveServer2 runs.
  • <port> is the port number on which HiveServer2 listens.
  • <db> is the database to which you want to connect. If you do not provide the database name, the Data Integration Service uses the default database details.
To connect to HiveServer 2, use the connection string format that Apache Hive implements for the specific Hadoop Distribution. For more information about Apache Hive connection string formats, see the Apache Hive documentation.
For user impersonation, you must add
hive.server2.proxy.user=<xyz>
to the JDBC connection URI. If you do not configure user impersonation, the current user's credentials are used connect to the HiveServer2.
If the Hadoop cluster uses SSL or TLS authentication, you must add
ssl=true
to the JDBC connection URI. For example:
jdbc:hive2://<hostname>:<port>/<db>;ssl=true
If you use self-signed certificate for SSL or TLS authentication, ensure that the certificate file is available on the client machine and the Data Integration Service machine. For more information, see the
Informatica Big Data Management Hadoop Integration Guide
.

0 COMMENTS

We’d like to hear from you!