Table of Contents


  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Create a Cluster Configuration

Perform this task in the following situations:
  • You are integrating for the first time.
  • You upgraded from any previous version.
A cluster configuration is an object in the domain that contains configuration information about the Hadoop cluster. The cluster configuration enables the Data Integration Service to push mapping logic to the Hadoop environment. Import configuration properties from the Hadoop cluster to create a cluster configuration.
The import process imports values from *-site.xml files into configuration sets based on the individual *-site.xml files. When you perform the import, the cluster configuration wizard can create Hadoop, HBase, HDFS, and Hive connection to access the Hadoop environment. If you choose to create the connections, the wizard also associates the cluster configuration with the connections.
If you are integrating for the first time and you imported the cluster configuration when you ran the installer, you
re-create or refresh the cluster configuration.

Dataproc version 2.0 clusters

When you create a cluster configuration for a Google Dataproc cluster, by default the cluster configuration is created for Dataproc version 1.4. To integrate with Dataproc 2.x clusters, you must manually update the cluster configuration version property to 2.0.
You only need to perform this workaround for Informatica versions 10.5.1x.
  1. In the Administrator tool, click Connections.
  2. Expand the Cluster Configurations node in the Domain Navigator and select the Dataproc cluster configuration.
  3. Edit the Distribution Version property of the Dataproc cluster configuration. Change the property value to
  4. Save the changes and restart .