Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Cluster Workflow Components

Cluster Workflow Components

The cluster workflow that creates an ephemeral cluster includes a Create Cluster task, at least one Mapping task, and a Delete Cluster task.
The following image shows a sample cluster workflow: The cluster workflow shows a Atart event, a Create Cluster task, a mappingns, a Delete Cluster task, and an End Event connected with arrows.
A cluster workflow uses the following components:
Cloud provisioning configuration
The cloud provisioning configuration is associated with the Create Cluster task through the cluster connection.
Cluster connection
The cluster connection to use with a cluster workflow is associated with a cloud provisioning configuration. You can use a Hadoop or Databricks cluster connection. When you run the workflow the Data Integration Service creates a temporary cluster connection.
Create Cluster task
The Create Cluster task contains all the settings that the cloud platforms require to create a cluster with a master node and worker nodes. It also contains a reference to a cloud provisioning configuration. Include one Create Cluster task in a cluster workflow.
Mapping task
Add a data engineering mapping to the Mapping task. A cluster workflow can include more than one Mapping task. You can run some mappings on an existing cluster and you can run some mappings on a cluster that the workflow creates. You configure the mappings and Mapping tasks based on where you want to run the task.
Delete Cluster task
The Delete Cluster task terminates the cluster and deletes the cluster and all resources that the workflow creates.


Updated September 28, 2020