Release Notes (10.4.1.3)

Release Notes (10.4.1.3)

Post-installation Tasks for Cloudera CDP Public Cloud

Post-installation Tasks for Cloudera CDP Public Cloud

Perform the following steps to integrate Data Engineering Integration with a Cloudera CDP Public Cloud cluster on Azure or AWS for the first time.
  1. Prepare files for cluster import from Cloudera. Verify properties in *-site.xml files.
  2. Create a Hive metastore on the CDP Data Hub cluster that points to the Hive metastore in the Cloudera Data Lake.
  3. Create a cluster configuration using the IP information for the CDP Data Hub cluster.
  4. Grant Access Control List (ACL) permissions for the staging directories on the CDP Data Hub cluster to the Hive user and the impersonation user.
    Run the following command on the CDP cluster:
    hadoop fs -setfacl -m user:user:rwx <staging directory>
  5. Copy the auto-TLS certificate file from the cluster node to the domain on your virtual machine.
    1. Find the value for the property
      ssl.client.truststore.location
      in the following file on the cluster:
      /etc/hadoop/conf/ssl-client.xml
      The value of this property is the file path for the file
      cm-auto-global_truststore.jks
      . For example,
      /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks
    2. Locate the .jks file at the file path you found in step a and copy the file.
    3. Create the same directory structure in the Informatica server node and paste the .jks file there. For example,
      <Informatica server node>/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks
  6. Verify JDBC drivers for Sqoop connectivity.
  7. Set the locale.
  8. To use Kerberos authentication, configure the krb5.conf file on any cluster node.
    1. Find the value for the property
      default_realm
      in the following file on the cluster:
      /etc/krb5.conf
      The value of this property is the name of the default service realm for the Informatica domain.
    2. Run the following command on any cluster node to verify that you can access the Key Distribution Center (KDC) server:
      ping kdc.<default service realm>
      This command returns the KDC server IP address.
    3. In the krb5.conf file on the Informatica server node, add the KDC server entries under
      [realms]
      .
      For example:
      [realms] INFARNDC.SRC9-LTFL.CLOUDERA.SITE = { pkinit_anchors = FILE:/var/lib/ipa-client/pki/kdc-ca-bundle.pem pkinit_pool = FILE:/var/lib/ipa-client/pki/ca-bundle.pem kdc = <KDC server IP address obtained from step b> admin_server = <KDC server IP address obtained from step b> }
  9. To use Apache Knox authentication, add the proxy entries for the keytab user to the Knox IDBroker service that runs on the Cloudera Data Lake cluster.
    For example, add the following entries to the configuration page for
    idbroker_kerberos_dt_proxyuser_block
    :
    “hadoop.proxyuser.csso_<keytab user>.groups": "*" "hadoop.proxyuser.csso_<keytab user>.hosts": "*" "hadoop.proxyuser.csso_<keytab user>.users": "spn_user"
  10. Configure the Developer tool.
Note the following rules when you use a CDP Public Cloud cluster:
  • If you are using an HDFS on a Cloudera Data Lake cluster, perform the following tasks to configure the HDFS connection and the Hadoop connection:
    1. Find the value for the property
      fs.defaultFS
      in the following file on the namenode cluster:
      /etc/hadoop/conf/core-site.xml
      For example:
      hdfs://infarndcdppamdl-master1.infarndc.src9-ltfl.cloudera.site:8020
    2. In the HDFS connection, set the property
      NameNode URI
      to the value you found for
      fs.defaultFS
      .
    3. In the Hadoop connection, set the Spark advanced property
      spark.yarn.access.hadoopFileSystems
      to the value you found for
      fs.defaultFS
      .
      For example:
      spark.yarn.access.hadoopFileSystems= hdfs://infarndcdppamdl-master1.infarndc.src9-ltfl.cloudera.site:8020
  • When you run a mapping using either an operating system profile or a Hadoop impersonation user for the Data Integration Service, the Hadoop administrator must add the impersonation user to FreeIPA and map the user to a cloud role using Knox IDBroker.


Updated June 07, 2021