You can configure PowerCenter and the PowerCenter Integration Service to read data from and write data to a Hadoop cluster. The Hadoop cluster could be a High Availability (HA), non-HA, kerberos-enabled, or non-kerberos cluster.
Perform the following steps to configure PowerCenter for Cloudera, Hortonworks, IBM BigInsights, and MapR distributions:
On the Informatica node where PowerCenter Integration Service runs, create a directory. The PowerCenter administrator user must have the read access on this directory. For example:
<INFA_HOME>/pwx-hadoop/conf
Copy the following files from Hadoop cluster to directory created in step 1:
/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/mapred-site.xml
/etc/hadoop/conf/hdfs-site.xml
/etc/hive/conf/hive-site.xml
Optional. Applicable to kerberos-enabled clusters. Ensure that the PowerCenter administrator user exists on all Hadoop cluster nodes and has the same UID and run kinit to create Kerberos ticket cache file on all nodes.
Optional. Applicable to kerberos-enabled clusters. Run the kinit on the Informatica node where PowerCenter Integration Service runs to create the Kerberos ticket cache file. For example:
/tmp/krb5cc_<UID>
Optional. Applicable to kerberos-enabled clusters except MapR. Edit the
core-site.xml
configuration set in the directory created in step 1 and add the following parameter:
<description>Path to the Kerberos ticket cache. </description>
</property>
In the Administrator tool, go to the
Services and Nodes
tab. Select the
Processes
view for the required PowerCenter Integration Service and add the environment variable "CLASSPATH" with the value of the directory created in step 1.
Recycle the Service.
Click
Actions
Recycle Service
.
In the Workflow Manager, create the HDFS connection and assign to source or target and run the workflow. When you create the HDFS connection, use the value for the
fs.default.name
property for the NameNode URI. You can find the value for the