Table of Contents


  1. Preface
  2. Introduction to Hadoop Integration
  3. Before You Begin
  4. Amazon EMR Integration Tasks
  5. Azure HDInsight Integration Tasks
  6. Cloudera CDH Integration Tasks
  7. Hortonworks HDP Integration Tasks
  8. MapR Integration Tasks
  9. Appendix A: Connections

Verify Port Requirements

Verify Port Requirements

Open a range of ports to enable the Informatica domain to communicate with the Hadoop cluster and the distribution engine.
To ensure access to ports, the network administrator needs to complete additional tasks in the following situations:
  • The Hadoop cluster is behind a firewall. Work with the network administrator to open a range of ports that a distribution engine uses.
  • The Hadoop environment uses Azure HDInsight. Work with the network administrator to enable VPN between the Informatica domain and the Azure cloud network.
The following table lists the ports to open:
Cluster management web app for Cloudera. Required for Cloudera only.
NameNode RPC. Required for all supported distributions except MapR.
ResourceManager. Required for all distributions.
Cluster management web app. Used by distributions that use Ambari to manage the cluster: HDinsight, Hortonworks.
Resource Manager web app. Required for all distributions.
MapR control system. Required for MapR only.
Blaze monitoring console. Required for all distributions if you run mappings using Blaze.
Hive metastore. Required for all distributions.
12300 to 12600
Default port range for the Blaze distribution engine. A port range is required for all distributions if you run mappings using Blaze.
YARN JobHistory server webapp. Optional for all distributions.
HDFS Namenode HTTP. Required for all distributions.
The network administrators must ensure that the port used by the Metadata Access Service is accessible from the cluster nodes.

Spark Engine Monitoring Port

Spark engine monitoring requires the cluster nodes to communicate with the Data Integration Service over a socket. The Data Integration Service picks the socket port randomly from the port range configured for the domain. You can view the port range in the advanced properties of the primary node. By default, the minimum port number is 12000 and the maximum port number is 13000. The network administrators must ensure that the port range is accessible from the cluster nodes to the Data Integration Service. If the administrators cannot provide a port range access, you can configure the Data Integration Service to use a fixed port with the SparkMonitoringPort custom property. The network administrator must ensure that the configured port is accessible from the cluster nodes to the Data Integration Service.

Back to Top


We’d like to hear from you!