Implementing Informatica® Big Data Management 10.2 in an Amazon Cloud Environment

Back Next

Step 2. Update Configuration Files and Connections

In this step, the workflow uses a script that lists configuration files and domain connections and updates them with a variable that represents the IP address of the cluster master node.

The script performs the following actions:

Creates an alias for the IP address of the cluster master node, and gets the IP address from the

ClusterCreation.txt

file that you created in the previous script.

For example,

ipaddrOfMaster=`cat /tmp/.ClusterCreation.txt | python -c 'import sys, json; print json.load(sys.stdin)["Instances"][0]["PrivateIpAddress"]'`

Creates Hive, HDFS, Hadoop, and other connections, including authentication values, essential port numbers, and the IP address of the cluster master node.

For example, the following section of the script creates a Hive connection:

echo "" >> /tmp/.emr_automation.log
echo "Creating hive connection" >> /tmp/.emr_automation.log

${infa_home}/isp/bin/infacmd.sh createConnection -dn 'domain' -un 'Administrator' -pd 'Administrator' -cn 'Hive_conn_test_automation' -cid 'Hive_conn_test_automation' -ct HIVE -o "connectString=jdbc:hive2://${ipaddrOfMaster}:10000/default enableQuotes=false metadataConnString=jdbc:hive2://${ipaddrOfMaster}:10000/default bypassHiveJDBCServer=false pushDownMode=true relationalSourceAndTarget=true databaseName=default defaultFSURI=hdfs://${ipaddrOfMaster}:8020/ hiveWarehouseDirectoryOnHDFS='/user/hive/warehouse' jobTrackerURI=${ipaddrOfMaster}:8021 metastoreExecutionMode=remote remotemetastoreuri=thrift://${ipaddrOfMaster}:9083 username='hadoop'"   >> /tmp/.emr_automation.log

The script refers to cluster nodes, including the master node, by the IP address only.

The script locates each element, such as the default.FS URI and the metastore, as being on the master node.

The example contains only mandatory arguments, but you can add optional arguments to give custom attributes to the connection.

Edits the

yarn-site.xml

file on the Informatica domain with the IP address of the cluster master node.

For example,

echo "" >> /tmp/.emr_automation.log
echo "Updating yarn-site.xml" >> /tmp/.emr_automation.log

cp ${infa_home}/services/shared/hadoop/amazon_emr5.0.0/conf/yarn-site.xml_org ${infa_home}/services/shared/hadoop/amazon_emr5.0.0/conf/yarn-site.xml 
sed -i -- "s/HOSTNAME/$ipaddrOfMaster/g" ${infa_home}/services/shared/hadoop/amazon_emr5.0.0/conf/yarn-site.xml

Returns a message that the step is complete and the workflow can execute the next step.

Rename Saved Search

Table of Contents

Implementing Informatica® Big Data Management 10.2 in an Amazon Cloud Environment

Implementing Informatica® Big Data Management 10.2 in an Amazon Cloud Environment

Step 2. Update Configuration Files and Connections

Step 2. Update Configuration Files and Connections