Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Edit the Hosts File for Access to Azure HDInsight

Edit the Hosts File for Access to Azure HDInsight

Ensure that Informatica can access the HDInsight cluster by updating the
/etc/hosts
file on all machines that host the Data Integration Service.
Perform this task in the following situations:
  • You are integrating for the first time.
  • You upgraded from a version 10.4 or earlier.

Configure Dynamic Updates for the Cluster Headnode Host

Identify the headnode host, and schedule a script to regularly update the entry.
The HDInsight cluster designates one cluster node as the headnode. If the node fails or stops for maintenance, the cluster designates another node as the headnode. When this happens, the headnodehost entry in
/etc/hosts
files on Informatica domain nodes requires updating. You can schedule a script to perform this update.
  1. In the
    /etc/hosts
    file on each machine that hosts the Data Integration Service, enter the IP address, DNS name, and DNS short name for each data node on the cluster. Use
    headnodehost
    to identify the host as the cluster headnode host.
    For example:
    10.20.30.40 hn0-rndhdi.abcdefghaouniiuvfp3betl3d.ix.internal.cloudapp.net headnodehost
  2. Download the headnode_update_script.zip file from the Informatica documentation portal and uncompress it to get headnode_update_script.sh.
    The script gets the the IP address, DNS name, and DNS short name for the head node host from the
    x-ms-hdi-active
    property on the cluster. The script then replaces the value of
    headnodehost
    in the hosts file with the
    x-ms-hdi-active
    value.
  3. Schedule the script to run regularly on each machine that hosts the Data Integration Service. Set up a schedule based on your requirements, such as daily updates.

Configure IP Addresses for ADLS Storage

If the HDInsight cluster is integrated with ADLS storage, you also need to enter the IP addresses and DNS names for the hosts listed in the cluster property fs.azure.datalake.token.provider.service.urls.
For example:
1.2.3.67 gw1-ltsa.1320suh5abcdefghgaz0izgnhe.gx.internal.cloudapp.net 1.2.3.68 gw0-ltsa.1320suh5abcdefghgaz0izgnhe.gx.internal.cloudapp.net
To get the IP addresses, run a telnet command from the cluster host using each host name found in the fs.azure.datalake.token.provider.service.urls property.