Before you install Enterprise Data Catalog on an embedded Hadoop cluster, you must verify that the system environment meets the prerequisites required to deploy Enterprise Data Catalog.
Verify that the internal Hadoop distribution meets the following prerequisites:
Operating system is 64-bit Red Hat Enterprise Linux version 6.5 or later.
For Red Hat Enterprise Linux version 7.0, make sure that you are using the following versions of snappy-devel and Sudo:
snappy-devel-1.0.5-1.el6.x86_64 on all Apache Ambari hosts.
Sudo 1.8.16
Verify that you disable SSL certificate validation if you are using Red Hat Enterprise Linux.
Verify that the cluster nodes meet the following requirements:
Node Type
Minimum Requirements
Master node
The number of CPUs is 4.
Unused memory available for use is 16 GB.
Disk space is 60 GB.
Slave node
The number of CPUs is 4.
Unused memory available for use is 16 GB.
Disk space is 60 GB.
If the cluster is enabled for SSL, ensure that you import the Ambari Server certificate to the Informatica domain truststore.
Verify that the root directory (
/
) has a minimum of 10 GB of free disk space.
If you want to mount Informatica Cluster Service on a separate mount location, verify that the mount location has a minimum of 50 GB of free disk space.
Verify that the Linux repository includes postgresql version 8.14.18, release 1.el6_4, installed or install the listed version and release of postgresql.
Make sure that you merge the user and host keytab files before you enable Kerberos authentication for Informatica Cluster Service.
Verify that you install the following prerequisite packages before you enable Kerberos:
krb5-workstation
krb5-libs
krb5-auth-dialog
Make sure that the
NOEXEC
flag is not set for the file system mounted on the
/tmp
directory.
Ensure that the Linux base repositories are configured.
Verify that you have the write permission on the
/home
directory.
On each host machine, verify that you have the following tools and applications available:
YUM and RPM (RHEL/CentOS/Oracle Linux)
Zypper and php_curl (SLES)
apt (Ubuntu)
scp, curl, unzip, tar, and wget
awk
OpenSSL version 1.0.1e-30.el6_6.5.x86_64 or later. Make sure that you do not use versions in the 1.0.2 branch.
Make sure that the $PATH variable points to the
/usr/bin
directory to use the correct version of Linux OpenSSL.
Verify that the secure path in the
/etc/sudoers
file has the
/usr/bin
directory location at the start.
Python version 2.6.x for Red Hat Enterprise Linux version 6.5.
If you install SUSE Linux Enterprise 11, update all the hosts to Python version 2.6.8-0.15.1.
Python version 2.7.x for Red Hat Enterprise Linux version 7.0.
If you install on SUSE Linux Enterprise 12, make sure that you install the following RPM Package Manager (RPMs) on all the cluster nodes:
openssl-1.0.1c-2.1.3.x86_64.rpm
libopenssl1_0_0-1.0.1c-2.1.3.x86_64.rpm
libopenssl1_0_0-32bit-1.0.1c-2.1.3.x86_64.rpm
python-devel-2.6.8-0.15.1.x86_64
If you have not configured the Linux base repository or if you do not have an Internet connection, install the following packages:
Version 8.4 of the following RPMs on the Ambari Server host:
postgresql-libs
postgresql-server
postgresql
The following RPMs on all cluster nodes:
nc
redhat-lsb
psmisc
python-devel-2.7.5-34.el7.x86_64
If you do not have an Internet connection, make sure that you have installed Java Development Kit (JDK) version 1.8. Configure the JAVA_HOME environment variable to point to the JDK installation.
If you have an Internet connection and any version of JDK installed, uninstall the JDK.
Enterprise Data Catalog installs JDK version 1.8 and PostgreSQL version 8.4 as part of Apache Ambari installation. The location of the JDK package is
Ensure that you install JDK 1.8 on all cluster nodes.
Apache Ambari requires certain ports that are open and available during the installation to communicate with the hosts that Apache Ambari deploys and manages. You need to temporarily disable the iptables to meet this requirement.
Verify that you meet the memory and package requirements for Apache Ambari. For more information, see the Hortonworks documentation.
Make sure that each machine in the cluster includes the
127.0.0.1 localhost localhost.localdomain
entry in the
/etc/hosts
file.
Verify that the
/etc/hosts
file includes the fully-qualified host names for all the cluster nodes. Alternatively, make sure that reverse DNS lookup returns the fully-qualified host names for all the cluster nodes.
Before you deploy Enterprise Data Catalog on clusters where Apache Ranger is enabled, make sure that you configure the following permissions for the Informatica domain user:
Write permission on the HDFS folder.
Permission to submit applications to the YARN queue.
If the cluster is enabled for SSL, it is recommended to enable SSL for the Informatica domain, the Informatica Cluster Service, and the Catalog Service.
If you want to enable Kerberos authentication for Enterprise Data Catalog deployed on a multi-node Informatica domain, make sure that you complete the following prerequisites:
Make sure that all the domain nodes include the
krb5.conf
file in the following directories:
$INFA_HOME/services/shared/security/
/etc/
Make sure that the
/etc/hosts
file of all cluster nodes and domain nodes include the krb hosts entry and a host entry for other nodes.
Install
krb5-workstation
in all domain nodes.
Make sure that the keytab file is present in a common location on all domain nodes.
If you want to enable SSL authentication for Enterprise Data Catalog deployed on a multi-node Informatica domain, make sure that you complete the following prerequisites:
Export the Default.keystore of each node to the infa_truststore.jks on all nodes.
Make sure that the Default.keystore is unique for each host node.
Copy the Default.keystore to a unique location of each node.
If Informatica Cluster Service and Catalog Service are on different nodes, then export the Apache Ambari server certificate to the infa_truststore.jks on all nodes.