Security for implementations of Data Engineering Integration includes security for the Informatica domain native environment and for the non-native environments.
Security for the Hadoop Environment
You can configure security for the Informatica domain and the Hadoop cluster to protect from threats inside and outside the network. Security for the Hadoop cluster includes the following areas:
Authentication
When the Informatica implementation includes Data Engineering Integration, user identities must be authenticated in the Informatica domain and the Hadoop cluster. Authentication for the Informatica domain is separate from authentication for the Hadoop cluster.
By default, Hadoop does not verify the identity of users. To authenticate user identities, you can configure the following authentication protocols on the cluster:
Native authentication
Lightweight Directory Access Protocol (LDAP)
Kerberos, when the Hadoop distribution supports it
Apache Knox Gateway
Data Engineering Integration also supports Hadoop clusters that use a Microsoft Active Directory (AD) Key Distribution Center (KDC) or an MIT KDC.
Authorization
After a user is authenticated, a user must be authorized to perform actions. For example, a user must have the correct permissions to access the directories where specific data is stored to use that data in a mapping.
You can run mappings on a cluster that uses one of the following security management systems for authorization:
Cloudera Navigator Encrypt
HDFS permissions
User impersonation
Apache Ranger
Apache Sentry
HDFS Transparent Encryption
Data and metadata management
Data and metadata management involves managing data to track and audit data access, update metadata, and perform data lineage. Data Engineering Integration supports Cloudera Navigator and Metadata Manager to manage metadata and perform data lineage.
Data security
Data security involves protecting sensitive data from unauthorized access. Data Engineering Integration supports data masking with the Data Masking transformation in the Developer tool, Dynamic Data Masking, and Persistent Data Masking.
Operating system profiles
An operating system profile is a type of security that the Data Integration Service uses to run mappings. Use operating system profiles to increase security and to isolate the run-time environment for users. Data Engineering Integration supports operating system profiles on all Hadoop distributions. In the Hadoop run-time environment, the Data Integration Service pushes the processing to the Hadoop cluster and the run-time engines run mappings with the operating system profile.
Security for the Databricks Environment
The Data Integration Service uses token-based authentication to provide access to the Databricks environment. Generate tokens within the Databricks environment and use the token ID to connect to Databricks.