Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Catalog Service
  4. Content Management Service
  5. Data Integration Service
  6. Data Integration Service Architecture
  7. Data Integration Service Management
  8. Data Integration Service Grid
  9. Data Integration Service Applications
  10. Data Preparation Service
  11. Enterprise Data Lake Service
  12. Informatica Cluster Service
  13. Mass Ingestion Service
  14. Metadata Access Service
  15. Metadata Manager Service
  16. Model Repository Service
  17. PowerCenter Integration Service
  18. PowerCenter Integration Service Architecture
  19. High Availability for the PowerCenter Integration Service
  20. PowerCenter Repository Service
  21. PowerCenter Repository Management
  22. PowerExchange Listener Service
  23. PowerExchange Logger Service
  24. SAP BW Service
  25. Search Service
  26. System Services
  27. Test Data Manager Service
  28. Test Data Warehouse Service
  29. Web Services Hub
  30. Application Service Upgrade
  31. Appendix A: Application Service Databases
  32. Appendix B: Connecting to Databases from Windows
  33. Appendix C: Connecting to Databases
  34. Appendix D: Updating the DynamicSections Parameter of a DB2 Database

Data Preparation Service Overview

Data Preparation Service Overview

The
Data Preparation Service
is an application service that manages data preparation within the
Enterprise Data Lake application
.
When an analyst prepares data in a project, the
Data Preparation Service
connects to the Data Preparation repository to store worksheet metadata. The service connects to the Hadoop cluster to read sample data or all data from the Hive table, depending on the size of the data. The service connects to the HDFS system in the Hadoop cluster to store the sample data being prepared in the worksheet.
The
Data Preparation Service
uses an Oracle database, a MySQL database, or a MariaDB database for the data preparation repository. You must configure a local storage location for data preparation file storage on the node on which the Data Preparation Service runs. The
Data Preparation Service
uses the Apache Solr indexing capabilities to provide recommendations of related data assets. This Solr instance does not run on the Hadoop cluster and is managed by the
Data Preparation Service
.
You can create the
Data Preparation Service
when you install
Enterprise Data Lake
, or you can use the Administrator tool to create the service after installation. Create the
Data Preparation Service
before you create the
Enterprise Data Lake Service
. When you create the
Enterprise Data Lake Service
, you must associate it with a
Data Preparation Service
.

0 COMMENTS

We’d like to hear from you!