Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections

Verify Port Requirements

Verify Port Requirements

Open a range of ports to enable the Informatica domain to communicate with the Hadoop cluster and the distribution engine.
To ensure access to ports, the network administrator needs to complete additional tasks in the following situations:
  • The Hadoop cluster is behind a firewall. Work with the network administrator to open a range of ports that a distribution engine uses.
  • The Hadoop environment uses Azure HDInsight. Work with the network administrator to enable VPN between the Informatica domain and the Azure cloud network.
The following table lists the ports to open:
Port
Description
7180
Cluster management web app for Cloudera. Required for Cloudera only.
8020
NameNode RPC. Required for all supported distributions except MapR.
8032
ResourceManager. Required for all distributions.
8080
Cluster management web app. Used by distributions that use Ambari to manage the cluster: HDinsight, Hortonworks.
8088
Resource Manager web app. Required for all distributions.
8443
MapR control system. Required for MapR only.
9080
Blaze monitoring console. Required for all distributions if you run mappings using Blaze.
9083
Hive metastore. Required for all distributions.
12300 to 12600
Default port range for the Blaze distribution engine. A port range is required for all distributions if you run mappings using Blaze.
19888
YARN JobHistory server webapp. Optional for all distributions.
50070
HDFS Namenode HTTP. Required for all distributions.
The network administrators must ensure that the port used by the Metadata Access Service is accessible from the cluster nodes.

Spark Engine Monitoring Port

Spark engine monitoring requires the cluster nodes to communicate with the Data Integration Service over a socket. The Data Integration Service picks the socket port randomly from the port range configured for the domain. You can view the port range in the advanced properties of the primary node. By default, the minimum port number is 12000 and the maximum port number is 13000. The network administrators must ensure that the port range is accessible from the cluster nodes to the Data Integration Service. If the administrators cannot provide a port range access, you can configure the Data Integration Service to use a fixed port with the SparkMonitoringPort custom property. The network administrator must ensure that the configured port is accessible from the cluster nodes to the Data Integration Service.