Table of Contents

Search

  1. Abstract
  2. Supported Versions
  3. Implementing Informatica® Big Data Management 10.2 in an Amazon Cloud Environment

Implementing Informatica® Big Data Management 10.2 in an Amazon Cloud Environment

Implementing Informatica® Big Data Management 10.2 in an Amazon Cloud Environment

Step 2. Update Configuration Files and Connections

Step 2. Update Configuration Files and Connections

In this step, the workflow uses a script that lists configuration files and domain connections and updates them with a variable that represents the IP address of the cluster master node.
The script performs the following actions:
  1. Creates an alias for the IP address of the cluster master node, and gets the IP address from the
    ClusterCreation.txt
    file that you created in the previous script.
    For example,
    ipaddrOfMaster=`cat /tmp/.ClusterCreation.txt | python -c 'import sys, json; print json.load(sys.stdin)["Instances"][0]["PrivateIpAddress"]'`
  2. Creates Hive, HDFS, Hadoop, and other connections, including authentication values, essential port numbers, and the IP address of the cluster master node.
    For example, the following section of the script creates a Hive connection:
    echo "" >> /tmp/.emr_automation.log echo "Creating hive connection" >> /tmp/.emr_automation.log ${infa_home}/isp/bin/infacmd.sh createConnection -dn 'domain' -un 'Administrator' -pd 'Administrator' -cn 'Hive_conn_test_automation' -cid 'Hive_conn_test_automation' -ct HIVE -o "connectString=jdbc:hive2://${ipaddrOfMaster}:10000/default enableQuotes=false metadataConnString=jdbc:hive2://${ipaddrOfMaster}:10000/default bypassHiveJDBCServer=false pushDownMode=true relationalSourceAndTarget=true databaseName=default defaultFSURI=hdfs://${ipaddrOfMaster}:8020/ hiveWarehouseDirectoryOnHDFS='/user/hive/warehouse' jobTrackerURI=${ipaddrOfMaster}:8021 metastoreExecutionMode=remote remotemetastoreuri=thrift://${ipaddrOfMaster}:9083 username='hadoop'" >> /tmp/.emr_automation.log
    • The script refers to cluster nodes, including the master node, by the IP address only.
    • The script locates each element, such as the default.FS URI and the metastore, as being on the master node.
    • The example contains only mandatory arguments, but you can add optional arguments to give custom attributes to the connection.
  3. Edits the
    yarn-site.xml
    file on the Informatica domain with the IP address of the cluster master node.
    For example,
    echo "" >> /tmp/.emr_automation.log echo "Updating yarn-site.xml" >> /tmp/.emr_automation.log cp ${infa_home}/services/shared/hadoop/amazon_emr5.0.0/conf/yarn-site.xml_org ${infa_home}/services/shared/hadoop/amazon_emr5.0.0/conf/yarn-site.xml sed -i -- "s/HOSTNAME/$ipaddrOfMaster/g" ${infa_home}/services/shared/hadoop/amazon_emr5.0.0/conf/yarn-site.xml
  4. Returns a message that the step is complete and the workflow can execute the next step.

0 COMMENTS

We’d like to hear from you!