Table of Contents

Search

  1. Preface
  2. Introduction to Big Data Management Administration
  3. Big Data Management Engines
  4. Authentication and Authorization
  5. Running Mappings on a Cluster with Kerberos Authentication
  6. Configuring Access to an SSL/TLS-Enabled Cluster
  7. Cluster Configuration
  8. Cluster Configuration Privileges and Permissions
  9. Cloud Provisioning Configuration
  10. Queuing
  11. Tuning for Big Data Processing
  12. Connections
  13. Multiple Blaze Instances on a Cluster

Big Data Management Administrator Guide

Big Data Management Administrator Guide

Common Properties

Common Properties

The following table describes the common connection properties that you configure for the Hadoop connection:
Property
Description
Impersonation User Name
Required if the Hadoop cluster uses Kerberos authentication. Hadoop impersonation user. The user name that the Data Integration Service impersonates to run mappings in the Hadoop environment.
The Data Integration Service runs mappings based on the user that is configured. Refer the following order to determine which user the Data Integration Services uses to run mappings:
  1. Operating system profile user. The mapping runs with the operating system profile user if the profile user is configured. If there is no operating system profile user, the mapping runs with the Hadoop impersonation user.
  2. Hadoop impersonation user. The mapping runs with the Hadoop impersonation user if the operating system profile user is not configured. If the Hadoop impersonation user is not configured, the Data Integration Service runs mappings with the Data Integration Service user.
  3. Informatica services user. The mapping runs with the operating user that starts the Informatica daemon if the operating system profile user and the Hadoop impersonation user are not configured.
Temporary Table Compression Codec
Hadoop compression library for a compression codec class name.
The Spark engine does not support compression settings for temporary tables. When you run mappings on the Spark engine, the Spark engine stores temporary tables in an uncompressed file format.
Codec Class Name
Codec class name that enables data compression and improves performance on temporary staging tables.
Hive Staging Database Name
Namespace for Hive staging tables. Use the name
default
for tables that do not have a specified database name.
If you do not configure a namespace, the Data Integration Service uses the Hive database name in the Hive target connection to create staging tables.
Advanced Properties
List of advanced properties that are unique to the Hadoop environment. The properties are common to the Blaze, Spark, and Hive engines. The advanced properties include a list of default properties.
You can configure run-time properties for the Hadoop environment in the Data Integration Service, the Hadoop connection, and in the mapping. You can override a property configured at a high level by setting the value at a lower level. For example, if you configure a property in the Data Integration Service custom properties, you can override it in the Hadoop connection or in the mapping. The Data Integration Service processes property overrides based on the following priorities:
  1. Mapping custom properties set using
    infacmd ms runMapping
    with the
    -cp
    option
  2. Mapping run-time properties for the Hadoop environment
  3. Hadoop connection advanced properties for run-time engines
  4. Hadoop connection advanced general properties, environment variables, and classpaths
  5. Data Integration Service custom properties
Informatica does not recommend changing these property values before you consult with third-party documentation, Informatica documentation, or Informatica Global Customer Support. If you change a value without knowledge of the property, you might experience performance degradation or other unexpected results.

0 COMMENTS

We’d like to hear from you!