Release Notes (10.4.1.3)

Back Next

Post-installation Steps for Cloudera CDP Public Cloud

Perform the following tasks to integrate Data Engineering Integration with a Cloudera CDP Public Cloud cluster on Azure or AWS for the first time.

Prepare files for cluster import from Cloudera. Verify properties in *-site.xml files.

Create a Hive metastore on the CDP Data Hub cluster that points to the Hive metastore in the Cloudera Data Lake.

Create a cluster configuration using the Cloudera manager host of the CDP Data Hub cluster.

Grant Access Control List (ACL) permissions for the staging directories on the Data Hub cluster to the Hive user and the impersonation user.

Run the following command on the CDP cluster:

hadoop fs -setfacl -m user:user:rwx <staging directory>

Copy the auto-TLS certificate file from the cluster node to the domain on your virtual machine.

Find the value for the property ssl.client.truststore.location in the following file on the cluster:

/etc/hadoop/conf/ssl-client.xml

The value of this property is the file path for the file

cm-auto-global_truststore.jks

. For example,

/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks

Create the same directory structure in the Informatica server node. For example,

<Informatica server node>/var/lib/cloudera-scm-agent/agent-cert

Find the .jks file on the cluster in the file path you found in step a and copy the file to the directory you created on the Informatica domain in step b.

If you configure the Data Integration Service to use operating system profiles, configure the Hadoop impersonation user. Ensure that the impersonation user has permission to access the Hive warehouse directory.

Verify JDBC drivers for Sqoop connectivity.

Set the locale environment variables in the Hadoop connection cluster properties.

To use Kerberos authentication, configure the krb5.conf file on any cluster node.

Find the value for the property default_realm in the following file on the cluster:

/etc/krb5.conf

The value of this property is the name of the default service realm for the Informatica domain.

Run the following command on any cluster node to verify that you can access the Key Distribution Center (KDC) server:

ping kdc.<default service realm>

This command returns the KDC server IP address.

In the krb5.conf file on the Informatica server node, add the KDC server entries under [realms].

For example:

[realms]
INFARNDC.SRC9-LTFL.CLOUDERA.SITE = {
pkinit_anchors = FILE:/var/lib/ipa-client/pki/kdc-ca-bundle.pem
pkinit_pool = FILE:/var/lib/ipa-client/pki/ca-bundle.pem
kdc = <KDC server IP address obtained from step b>
admin_server = <KDC server IP address obtained from step b>
}

To use Apache Knox authentication, add the proxy entries for the keytab user to the Knox IDBroker service that runs on the Cloudera Data Lake cluster.

For example, add the following entries to the configuration page for idbroker_kerberos_dt_proxyuser_block:

“hadoop.proxyuser.csso_<keytab user>.groups": "*" 
"hadoop.proxyuser.csso_<keytab user>.hosts": "*" 
"hadoop.proxyuser.csso_<keytab user>.users": "spn_user"

Configure the Developer tool.

For more information about Cloudera CDP integration, see the

Data Engineering 10.4.1 Integration Guide

Rules and guidelines for integrating with Cloudera CDP Public Cloud

Note the following rules and guidelines when you use a CDP Public Cloud cluster:

If you are using an HDFS on a Cloudera Data Lake cluster, perform the following tasks to configure the HDFS connection and the Hadoop connection:

Find the value for the property fs.defaultFS in the following file on the namenode cluster:

/etc/hadoop/conf/core-site.xml

For example:

hdfs://infarndcdppamdl-master1.infarndc.src9-ltfl.cloudera.site:8020

In the HDFS connection, set the property

NameNode URI

to the value you found for fs.defaultFS.

In the Hadoop connection, set the Spark advanced property spark.yarn.access.hadoopFileSystems to the value you found for fs.defaultFS.

For example:

spark.yarn.access.hadoopFileSystems= hdfs://infarndcdppamdl-master1.infarndc.src9-ltfl.cloudera.site:8020

When you run a mapping using either an operating system profile or a Hadoop impersonation user for the Data Integration Service, the Hadoop administrator must add the impersonation user to FreeIPA and map the user to a cloud role using Knox IDBroker.

Rename Saved Search

Table of Contents

Release Notes (10.4.1.3)

Release Notes (10.4.1.3)

Post-installation Steps for Cloudera CDP Public Cloud

Post-installation Steps for Cloudera CDP Public Cloud

Rules and guidelines for integrating with Cloudera CDP Public Cloud