Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings
  4. Sources
  5. Targets
  6. Transformations
  7. Data Preview
  8. Cluster Workflows
  9. Profiles
  10. Monitoring
  11. Hierarchical Data Processing
  12. Hierarchical Data Processing Configuration
  13. Hierarchical Data Processing with Schema Changes
  14. Intelligent Structure Models
  15. Stateful Computing
  16. Appendix A: Connections
  17. Appendix B: Data Type Reference
  18. Appendix C: Function Reference

Cluster Workflow Components

Cluster Workflow Components

The cluster workflow that creates an ephemeral cluster includes a Create Cluster task, at least one Mapping task, and a Delete Cluster task.
The following image shows a sample cluster workflow: The cluster workflow shows a Atart event, a Create Cluster task, a mappingns, a Delete Cluster task, and an End Event connected with arrows.
A cluster workflow uses the following components:
Cloud provisioning configuration
The cloud provisioning configuration is associated with the Create Cluster task through the cluster connection.
Cluster connection
The cluster connection to use with a cluster workflow is associated with a cloud provisioning configuration. You can use a Hadoop or Databricks cluster connection. When you run the workflow the Data Integration Service creates a temporary cluster connection.
Create Cluster task
The Create Cluster task contains all the settings that the cloud platforms require to create a cluster with a master node and worker nodes. It also contains a reference to a cloud provisioning configuration. Include one Create Cluster task in a cluster workflow.
Mapping task
Add a big data mapping to the Mapping task. A cluster workflow can include more than one mapping task. You can run some mappings on an existing cluster and you can run some mappings on a cluster that the workflow creates. You configure the mappings and Mapping tasks based on where you want to run the task.
Delete Cluster task
The Delete Cluster task terminates the cluster and deletes the cluster and all resources that the workflow creates.


Updated July 10, 2020