Table of Contents


  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Managing Distribution Packages
  5. Appendix B: Connections Reference

Verify and Create Users

Verify and Create Users

The Data Integration Service requires different users to access the Hadoop environment. Any user that you create for an Azure HDInsight distribution must be an Azure Active Directory user. For other distributions, use Linux users.
One or more of the following users with access to the cluster is known as the "Informatica user" on the cluster.

Hadoop impersonation user

Verify that every node on the cluster has an impersonation user that can be used in a Hadoop connection. Create one if it does not exist. The Data Integration Service impersonates this user to run jobs in the Hadoop environment.
The following distributions use a Hadoop impersonation user:
Azure HDInsight
To run Sqoop mappings on the Spark engine, add the Hadoop impersonation user as a Linux user on the machine that hosts the Data Integration Service.
If the impersonation user contains mixed case characters, add the realm name along with the impersonation user.
Cloudera CDP Public Cloud
The Hadoop impersonation user must have access to the Hive warehouse directory.
If the MapR distribution uses Ticket or Kerberos authentication, the name must match the system user that starts the Informatica daemon and the gid of the user must match the gid of the MapR user.

Service principal name (SPN) for the Data Integration Service

If the cluster uses Kerberos authentication, verify that the SPN corresponding to the cluster keytab file matches the name of the system user that starts the Informatica daemon.

Hadoop staging user

Optionally, create an HDFS user that performs operations on the cluster staging directory. If you do not create a staging user, the Data Integration Service uses the operating system user that starts the Informatica daemon.

Blaze user

Optionally, create an operating system user account that the Blaze engine uses to write to staging and log directories. If you do not create a Blaze user, the Data Integration Service uses the Hadoop impersonation user.

Operating system profile user

If operating system profiles are configured for the Data Integration Service, the Data Integration Service runs jobs with permissions of the operating system user that you define in the profile. You can choose to use the operating system profile user instead of the Hadoop impersonation users to run jobs in a Hadoop environment.
To use operating system profile users with Cloudera CDP Public Cloud, configure an impersonation user, add the impersonation user to FreeIPA, and map the user to a cloud role using Knox IDBroker.

Mapping impersonation user

A mapping impersonation user is valid for the native run time environment. Use mapping impersonation to impersonate the Data Integration Service user that connects to Hive, HBase, or HDFS sources and targets that use Kerberos authentication. Configure functionality in the Data Integration Service and the mapping properties. The mapping impersonation user uses the following format:
<Hadoop service name>/<host name>@<Kerberos realm>


We’d like to hear from you!