Before you install Enterprise Data Catalog to use an external Hadoop cluster, you must verify that the system environment meets the prerequisites required to deploy Enterprise Data Catalog.
Verify that the external Hadoop distribution meets the following prerequisites:
OpenSSL version on the cluster nodes is openssl-1.0.1e-30.el6_6.5.x86_64 or later. Make sure that you do not use versions in the 1.0.2 branch.
Ensure that you install JDK 1.8 on all cluster nodes.
Verify that the secure path in the
/etc/sudoers
file has the
/usr/bin
directory location at the start.
On each host machine, verify that you have the zip and unzip utilities available.
You have the Read, Write, and Execute permissions for owners, groups, and others on HDFS directories.
Verify that the maximum number of open file descriptors is 10,000 or more. Use the
ulimit
command to verify the current value and change the value if required.
When you create the Catalog Service that connects to an SSL-enabled external cluster, verify that you configure the following properties:
A keytab file that contains all the users in LDAP.
Kerberos domain name.
HDFS namenode and YARN Resource Manager service principals
Path to Solr keystore file and password.
Import the Hadoop cluster certificates to the Informatica domain truststore.
Before you deploy Enterprise Data Catalog on clusters where Apache Ranger is enabled, make sure that the Informatica domain user has the required permission to submit applications to the YARN queue.
If the cluster is enabled for SSL, it is recommended to enable SSL for the Informatica domain and the Catalog Service.
Verify that you install the following prerequisite packages before you enable Kerberos:
krb5-workstation
krb5-libs
krb5-auth-dialog
Create the
service-logs
directory under
/informatica/ldm/<service cluster name>/
and assign the ownership of the directory to the service cluster user if the cluster is enabled for Kerberos.
If the cluster is not enabled for Kerberos, create the
service-logs
directory under
/informatica/ldm/<domain user name>/
and assign the ownership of the directory to the domain user.
If the cluster is not enabled for Kerberos. create the directory
<domain user name>
under
/user
and assign the ownership of directory to the domain user.
If the cluster is enabled for Kerberos, create the directory
<service cluster name>
under
/user
and assign the ownership of the directory to the service cluster user.
Ensure that you do not create the Informatica domain on a node in the existing Hadoop cluster.
If you want to enable Kerberos authentication for Enterprise Data Catalog deployed on a multi-node Informatica domain, make sure that you complete the following prerequisites:
Make sure that all the domain nodes include the
krb5.conf
file in the following directories:
$INFA_HOME/services/shared/security/
/etc/
Make sure that the
/etc/hosts
file of all cluster nodes and domain nodes include the krb hosts entry and a host entry for other nodes.
Install
krb5-workstation
in all domain nodes.
Make sure that the keytab file is present in a common location on all domain nodes.
If you want to enable SSL authentication for Enterprise Data Catalog deployed on a multi-node Informatica domain, make sure that you complete the following prerequisites:
Export the Default.keystore of each node to the infa_truststore.jks on all nodes.
Make sure that the Default.keystore is unique for each host node.
Copy the Default.keystore to a unique location of each node.
If Informatica Cluster Service and Catalog Service are on different nodes, then export the Apache Ambari server certificate to the infa_truststore.jks on all nodes.