Table of Contents

Search

  1. About the Enterprise Data Preparation Administrator Guide
  2. Introduction to Enterprise Data Preparation Administration
  3. Administration Process
  4. User Account Setup
  5. Application Configuration
  6. Roles, Privileges, and Profiles
  7. Data Asset Access and Publication Management
  8. Masking Sensitive Data
  9. Monitoring Enterprise Data Preparation
  10. Backing Up and Restoring Enterprise Data Preparation
  11. Managing the Data Lake
  12. Schedule Export, Import and Publish Activities
  13. Interactive Data Preparation Service
  14. Enterprise Data Preparation Service

Enterprise Data Preparation Administrator Guide

Enterprise Data Preparation Administrator Guide

Data Preparation Process

Data Preparation Process

Enterprise Data Preparation
connects to several Hadoop services on a Hadoop cluster to read from and write to Hive tables, to write events, and to store sample preparation data.
Enterprise Data Preparation
connects to the following services in the Hadoop cluster:
When an analyst uploads data to the data lake, the
Enterprise Data Preparation Service
connects to the Hadoop Distributed File System (HDFS) to stage the data in HDFS files.
When an analyst prepares data, the
Interactive Data Preparation Service
connects to HDFS to store the sample data being prepared in worksheets to HDFS files.
When an analyst previews data, the
Enterprise Data Preparation Service
connects to the Data Integration Service and reads the first 100 rows from the mapping using the JDBC driver.
When an analyst prepares data, the
Interactive Data Preparation Service
connects to HDFS and reads sample data from the Hive table, and displays the data in a worksheet.
When an analyst uploads data, the
Enterprise Data Preparation Service
connects to the Data Integration Service to read the temporary data staged in HDFS, and writes the data to a Hive table.
When an analyst publishes prepared data, the
Enterprise Data Preparation Service
connects to the Data Integration Service to run the converted mappings in the Hadoop environment. The Data Integration Service applies the mapping to the data in the input source, and writes the transformed data to a Hive table in the data lake.