Table of Contents

Search

  1. About the Enterprise Data Preparation Administrator Guide
  2. Introduction to Enterprise Data Preparation Administration
  3. Getting Started
  4. Administration Process
  5. User Account Setup
  6. Search Configuration
  7. Roles, Privileges, and Profiles
  8. Data Asset Access and Publication Management
  9. Masking Sensitive Data
  10. Monitoring Enterprise Data Preparation
  11. Backing Up and Restoring Enterprise Data Preparation
  12. Managing the Data Lakehouse
  13. Schedule Export, Import and Publish Activities
  14. Interactive Data Preparation Service
  15. Enterprise Data Preparation Service

Enterprise Data Preparation Administrator Guide

Enterprise Data Preparation Administrator Guide

Configuring the Informatica Domain for Data Masking

Configuring the Informatica Domain for Data Masking

Use the Administrator tool to create a Hive connection that
Enterprise Data Preparation
uses to connect to the Dynamic Data Masking Server. You then configure the
Enterprise Data Preparation Service
e to use the connection.
Informatica recommends that you create a new Hive connection instead of modifying an existing connection, so you can revert to using the existing connection if needed.
  1. Add an entry containing the IP address and host name for the Dynamic Data Masking Server host to the /etc/hosts file on each gateway node in the domain.
    This step is not required if the Dynamic Data Masking Server host and Informatica domain nodes belong to the same network group.
  2. In the Administrator tool, click the
    Connections
    tab.
  3. In the Navigator, select the domain.
  4. In the Navigator, click
    Actions
    New
    Connection
    .
    The
    New Connection
    dialog box appears.
  5. In the
    New Connection
    dialog box, select the
    Hive
    connection type, and then click
    OK
    .
    The
    New Connection
    wizard appears.
  6. Select the Hive connection used by
    Enterprise Data Preparation
    , and then click
    Edit
    in the Common Properties section.
  7. Specify the properties required to connect to the Dynamic Data Masking Server.
    The table below describes the properties to modify:
    Property
    Description
    Name
    The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
    ~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
    ID
    String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
    Description
    The description of the connection. The description cannot exceed 4000 characters.
    Cluster Configuration
    The cluster configuration to associate with the connection.
    User Name
    User name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster. The user name depends on the JDBC connection string that you specify in the Metadata Connection String or Data Access Connection String for the native environment.
    If the Hadoop cluster uses Kerberos authentication, the principal name for the JDBC connection string and the user name must be the same. Otherwise, the user name depends on the behavior of the JDBC driver. With Hive JDBC driver, you can specify a user name in many ways and the user name can become a part of the JDBC URL.
    If the Hadoop cluster does not use Kerberos authentication, the user name depends on the behavior of the JDBC driver.
    If you do not specify a user name, the Hadoop cluster authenticates jobs based on the following criteria:
    • The Hadoop cluster does not use Kerberos authentication. It authenticates jobs based on the operating system profile user name of the machine that runs the Data Integration Service.
    • The Hadoop cluster uses Kerberos authentication. It authenticates jobs based on the SPN of the Data Integration Service. User Name will be ignored.
    Password
    Password for the user name.
    Environment SQL
    SQL commands to set the Hadoop environment. In native environment type, the Data Integration Service executes the environment SQL each time it creates a connection to a Hive metastore. If you use the Hive connection to run profiles on a Hadoop cluster, the Data Integration Service executes the environment SQL at the beginning of each Hive session.
    The following rules and guidelines apply to the usage of environment SQL in both connection modes:
    • Use the environment SQL to specify Hive queries.
    • Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. If you use Hive user-defined functions, you must copy the .jar files to the following directory:
      <Informatica installation directory>/services/shared/hadoop/<Hadoop distribution name>/extras/hive-auxjars
    • You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries.
    • If you use multiple values for the Environment SQL property, ensure that there is no space between the values.
    SQL Identifier Character
    The type of character used to identify special characters and reserved SQL keywords, such as WHERE. The Data Integration Service places the selected character around special characters and reserved SQL keywords. The Data Integration Service also uses this character for the
    Support mixed-case identifiers
    property.
    JDBC Driver Class Name
    Name of the Hive JDBC driver class.
    Metadata Connection String
    The JDBC connection URI
    Enterprise Data Preparation
    uses to connect to the Dynamic Data Masking Server.
    To connect to the data lake in Hive, specify the connection string in the following format:
    jdbc:hive2://<hostname>:<port>/<schema>
    Where
    • <hostname> is name or IP address of the machine on which the Dynamic Data Masking Server runs.
    • <port> is the port number on which Dynamic Data Masking Server listens.
    • <schema> is the name of the Hive schema in the data lake.
    To enable the Dynamic Data Masking Server to connect to Hive using Kerberos authentication, you must add
    principal=<Hive server principal>
    to the URI. For example:
    jdbc:hive2://<hostname>:<port>/<schema>;principal=<Hive server principal>
    For user impersonation, you must add
    hive.server2.proxy.user=<xyz>
    to the JDBC connection URI. If you do not configure user impersonation, the current user's credentials are used connect to the HiveServer2.
    If the Hadoop cluster uses SSL or TLS authentication, you must add
    ssl=true
    to the JDBC connection URI. For example:
    jdbc:hive2://<hostname>:<port>/<db>;ssl=true
    If you use self-signed certificate for SSL or TLS authentication, ensure that the certificate file is available on the client machine and the Data Integration Service machine. For more information, see the
    Informatica Big Data Management Integration Guide
    .
    Bypass Hive JDBC Server
    JDBC driver mode. Set to false to use the JDBC non-embedded mode.
    If you choose the non-embedded mode, you must configure the Data Access Connection String.
    Fine Grained Authorization
    Set to true to force mappings to observe masking rules applied to columns by Dynamic Data Masking.
    Data Access Connection String
    The connection string to access data from the Hadoop data store through the Dynamic Data Masking Server.
    Enter the connection URI you specified as the value for the Metadata Connection String property.
  8. Click
    Finish
    .
  9. In the Administrator tool, click the
    Services and Nodes
    tab.
  10. In the Navigator, select the
    Enterprise Data Preparation Service
    .
  11. Click the edit icon in the
    Data Lake Options
    section.
  12. Select the Hive connection as the value for the
    Hive Connection
    property.
  13. Recycle the
    Enterprise Data Preparation Service
    .