Table of Contents


  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Managing Distribution Packages
  5. Appendix B: Connections Reference

Configure Data Integration Service Properties

Configure Data Integration Service Properties

The Data Integration Service contains properties that integrate the domain with the Hadoop cluster.
The following table describes the Data Integration Service properties that you need to configure:
Cluster Staging Directory
The directory on the cluster where the Data Integration Service pushes the binaries to integrate the native and non-native environments and to store temporary files during processing. Default is
Hadoop Staging User
The HDFS user that performs operations on the Hadoop staging directory. The user requires write permissions on Hadoop staging directory. Default is the operating system user that starts the Informatica daemon.
Custom Hadoop OS Path
The local path to the Informatica server binaries compatible with the Hadoop operating system. Required when the Hadoop cluster and the Data Integration Service are on different supported operating systems. The Data Integration Service uses the binaries in this directory to integrate the domain with the Hadoop cluster. The Data Integration Service can synchronize the following operating systems:

    SUSE and Redhat

Include the source directory in the path. For example,
<Informatica server binaries>/source
Changes take effect after you recycle the Data Integration Service.
When you install an Informatica EBF, you must also install it in this directory.
Hadoop Kerberos Service Principal Name
Service Principal Name (SPN) of the Data Integration Service to connect to a Hadoop cluster that uses Kerberos authentication.
Not required for the MapR distribution.
Hadoop Kerberos Keytab
The file path to the Kerberos keytab file on the machine on which the Data Integration Service runs.
Not required for the MapR distribution.
Custom Properties
Properties that are unique to specific environments.
You can configure run-time properties for the Hadoop environment in the Data Integration Service, the Hadoop connection, and in the mapping. You can override a property configured at a high level by setting the value at a lower level. For example, if you configure a property in the Data Integration Service custom properties, you can override it in the Hadoop connection or in the mapping. The Data Integration Service processes property overrides based on the following priorities:
  1. Mapping custom properties set using
    infacmd ms runMapping
    with the
  2. Mapping run-time properties for the Hadoop environment
  3. Hadoop connection advanced properties for run-time engines
  4. Hadoop connection advanced general properties, environment variables, and classpaths
  5. Data Integration Service custom properties
When a mapping uses Hive Server 2 to run a job or parts of a job, you cannot override properties that are configured on the cluster level in preSQL or post-SQL queries or SQL override statements.
Workaround: Instead of attempting to use the cluster configuration on the domain to override cluster properties, pass the override settings to the JDBC URL. For example:
beeline -u "jdbc:hive2://<domain host>:<port_number>/tpch_text_100" --hiveconf hive.execution.engine=tez


We’d like to hear from you!