Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Cluster Workflows Overview

Cluster Workflows Overview

A cluster workflow is a workflow that automates the creation of a cluster using a Create Cluster task, and then runs one or more mappings on that cluster. You can create a cluster workflow to run Mapping and other tasks on a cloud platform cluster. To use an ephemeral cluster to run mappings in the workflow, include a Delete Cluster task to terminate and delete the cluster after the mappings run.
The cluster workflow uses other elements that enable communication between the Data Integration Service and the cloud platform, such as a cloud provisioning configuration and a cluster connection.
A cluster workflow contains a Create Cluster task that you configure with information about the cluster to create. If you want to create an ephemeral cluster, you can include a Delete Cluster task. An ephemeral cluster is a cloud platform cluster that you create to run mappings and other tasks, and then terminate when tasks are complete. Create ephemeral clusters to save cloud platform resources.
You can create cluster workflows to create clusters to run on the Amazon AWS or Microsoft Azure cloud platforms in a Hadoop environment, or to create Databricks clusters to run in a Databricks environment.
On the Azure platform, you can create an ephemeral HDInsight cluster that accesses ADLS Gen2 resources. On the AWS platform, you can create an ephemeral Amazon EMR cluster to access S3, Redshift, and Snowflake resources.
To create a cluster on Cloudera Altus, you create a workflow with Command tasks that perform the tasks that a cluster workflow automates. For more information about creating a cluster on Cloudera Altus, see the article "How to Create Cloudera Altus Clusters with a Cluster Workflow in Big Data Management" on the Informatica Documentation Portal.


Updated September 28, 2020