Release Notes (10.4.1.3)

Release Notes (10.4.1.3)

Post-installation Steps for Cloudera CDP Public Cloud

Post-installation Steps for Cloudera CDP Public Cloud

Perform the following tasks to integrate Data Engineering Integration with a Cloudera CDP Public Cloud cluster on Azure or AWS for the first time.
  1. Prepare files for cluster import from Cloudera. Verify properties in *-site.xml files.
  2. Create a Hive metastore on the CDP Data Hub cluster that points to the Hive metastore in the Cloudera Data Lake.
  3. Create a cluster configuration using the Cloudera manager host of the CDP Data Hub cluster.
  4. Grant Access Control List (ACL) permissions for the staging directories on the Data Hub cluster to the Hive user and the impersonation user.
    Run the following command on the CDP cluster:
    hadoop fs -setfacl -m user:user:rwx <staging directory>
  5. Copy the auto-TLS certificate file from the cluster node to the domain on your virtual machine.
    1. Find the value for the property
      ssl.client.truststore.location
      in the following file on the cluster:
      /etc/hadoop/conf/ssl-client.xml
      The value of this property is the file path for the file
      cm-auto-global_truststore.jks
      . For example,
      /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks
    2. Create the same directory structure in the Informatica server node. For example,
      <Informatica server node>/var/lib/cloudera-scm-agent/agent-cert
    3. Find the .jks file on the cluster in the file path you found in step a and copy the file to the directory you created on the Informatica domain in step b.
  6. If you configure the Data Integration Service to use operating system profiles, configure the Hadoop impersonation user. Ensure that the impersonation user has permission to access the Hive warehouse directory.
  7. Verify JDBC drivers for Sqoop connectivity.
  8. Set the locale environment variables in the Hadoop connection cluster properties.
  9. To use Kerberos authentication, configure the krb5.conf file on any cluster node.
    1. Find the value for the property
      default_realm
      in the following file on the cluster:
      /etc/krb5.conf
      The value of this property is the name of the default service realm for the Informatica domain.
    2. Run the following command on any cluster node to verify that you can access the Key Distribution Center (KDC) server:
      ping kdc.<default service realm>
      This command returns the KDC server IP address.
    3. In the krb5.conf file on the Informatica server node, add the KDC server entries under
      [realms]
      .
      For example:
      [realms] INFARNDC.SRC9-LTFL.CLOUDERA.SITE = { pkinit_anchors = FILE:/var/lib/ipa-client/pki/kdc-ca-bundle.pem pkinit_pool = FILE:/var/lib/ipa-client/pki/ca-bundle.pem kdc = <KDC server IP address obtained from step b> admin_server = <KDC server IP address obtained from step b> }
  10. To use Apache Knox authentication, add the proxy entries for the keytab user to the Knox IDBroker service that runs on the Cloudera Data Lake cluster.
    For example, add the following entries to the configuration page for
    idbroker_kerberos_dt_proxyuser_block
    :
    “hadoop.proxyuser.csso_<keytab user>.groups": "*" "hadoop.proxyuser.csso_<keytab user>.hosts": "*" "hadoop.proxyuser.csso_<keytab user>.users": "spn_user"
  11. Configure the Developer tool.
For more information about Cloudera CDP integration, see the
Data Engineering 10.4.1 Integration Guide
.

Rules and guidelines for integrating with Cloudera CDP Public Cloud

Note the following rules and guidelines when you use a CDP Public Cloud cluster:
  • If you are using an HDFS on a Cloudera Data Lake cluster, perform the following tasks to configure the HDFS connection and the Hadoop connection:
    1. Find the value for the property
      fs.defaultFS
      in the following file on the namenode cluster:
      /etc/hadoop/conf/core-site.xml
      For example:
      hdfs://infarndcdppamdl-master1.infarndc.src9-ltfl.cloudera.site:8020
    2. In the HDFS connection, set the property
      NameNode URI
      to the value you found for
      fs.defaultFS
      .
    3. In the Hadoop connection, set the Spark advanced property
      spark.yarn.access.hadoopFileSystems
      to the value you found for
      fs.defaultFS
      .
      For example:
      spark.yarn.access.hadoopFileSystems= hdfs://infarndcdppamdl-master1.infarndc.src9-ltfl.cloudera.site:8020
  • When you run a mapping using either an operating system profile or a Hadoop impersonation user for the Data Integration Service, the Hadoop administrator must add the impersonation user to FreeIPA and map the user to a cloud role using Knox IDBroker.

0 COMMENTS

We’d like to hear from you!