PowerExchange for Hadoop User Guide for PowerCenter

PowerExchange for Hadoop User Guide for PowerCenter

Configure PowerCenter for Hadoop Cluster

Configure PowerCenter for Hadoop Cluster

You can configure PowerCenter and the PowerCenter Integration Service to read data from and write data to a Hadoop cluster. The Hadoop cluster could be a High Availability (HA), non-HA, kerberos-enabled, or non-kerberos cluster.
Perform the following steps to configure PowerCenter for Cloudera, Hortonworks, IBM BigInsights, and MapR distributions:
  1. On the Informatica node where PowerCenter Integration Service runs, create a directory. The PowerCenter administrator user must have the read access on this directory. For example:
    <INFA_HOME>/pwx-hadoop/conf
  2. Copy the following files from Hadoop cluster to directory created in step 1:
    • /etc/hadoop/conf/core-site.xml​
    • /etc/hadoop/conf/mapred-site.xml
    • /etc/hadoop/conf/hdfs-site.xml
    • /etc/hive/conf/hive-site.xml
  3. Optional. Applicable to kerberos-enabled clusters. Ensure that the PowerCenter administrator user exists on all Hadoop cluster nodes and has the same UID and run kinit to create Kerberos ticket cache file on all nodes.
  4. Optional. Applicable to kerberos-enabled clusters. Run the kinit on the Informatica node where PowerCenter Integration Service runs to create the Kerberos ticket cache file. For example:
    /tmp/krb5cc_<UID>
  5. Optional. Applicable to kerberos-enabled clusters except MapR. Edit the
    core-site.xml
    configuration set in the directory created in step 1 and add the following parameter:
    <property>
    <name>hadoop.security.kerberos.ticket.cache.path</name>
    <value>/tmp/REPLACE_WTH_CACHE_FILENAME</value>
    <description>Path to the Kerberos ticket cache. </description>
    </property>
  6. In the Administrator tool, go to the
    Services and Nodes
    tab. Select the
    Processes
    view for the required PowerCenter Integration Service and add the environment variable "CLASSPATH" with the value of the directory created in step 1.
  7. Recycle the Service.
    Click
    Actions
    Recycle Service
    .
  8. In the Workflow Manager, create the HDFS connection and assign to source or target and run the workflow. When you create the HDFS connection, use the value for the
    fs.default.name
    property for the NameNode URI. You can find the value for the
    fs.default.name
    property in the
    core-site.xml
    file.

0 COMMENTS

We’d like to hear from you!