Table of Contents

Search

  1. About the Enterprise Data Preparation Administrator Guide
  2. Introduction to Enterprise Data Preparation Administration
  3. Getting Started
  4. Administration Process
  5. User Account Setup
  6. Search Configuration
  7. Roles, Privileges, and Profiles
  8. Data Asset Access and Publication Management
  9. Masking Sensitive Data
  10. Monitoring Enterprise Data Preparation
  11. Backing Up and Restoring Enterprise Data Preparation
  12. Managing the Data Lakehouse
  13. Schedule Export, Import and Publish Activities
  14. Interactive Data Preparation Service
  15. Enterprise Data Preparation Service

Enterprise Data Preparation Administrator Guide

Enterprise Data Preparation Administrator Guide

Data Lake

Data Lake

The data lake used by
Enterprise Data Preparation
is a centralized repository of large volumes of structured and unstructured data. A data lake can contain different types of data, including raw data, refined data, master data, transactional data, log file data, and machine data.
The data lake can be deployed on-premise or in the cloud. The data lake utilizes Hive, the Spark engine, and a Hadoop-compatible file system. The data lake must be collocated with the Informatica application services associated with
Enterprise Data Preparation
. For example, if the data lake is deployed in Amazon EMR, you must also deploy
Enterprise Data Preparation
, Enterprise Data Catalog, and the Informatica services in Amazon EMR.
Enterprise Data Preparation
works with the security mechanism used by the cluster, including Kerberos, Apache Ranger, and Apache Sentry, to securely access data in the data lake.
You can upload assets such as comma-separated value files as Hive tables in the data lake using
Enterprise Data Preparation
.
You can also ingest data from external sources outside the data lake using Hadoop tools or Informatica Mass Ingestion.
You can prepare a Hive or file-based asset that exists in Enterprise Data Catalog. When you prepare a file-based asset,
Enterprise Data Preparation
creates an external temporary table in the Hive schema specified in the
Enterprise Data Preparation
Service.
When you publish prepared data,
Enterprise Data Preparation
writes the transformed input source to a Hive table or to a file in the data lake.