Table of Contents

Search

  1. Preface
  2. Introduction to Data Engineering Streaming
  3. Data Engineering Streaming Administration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Window Transformation
  8. Appendix A: Connections
  9. Appendix B: Monitoring REST API Reference
  10. Appendix C: Sample Files

Ephemeral Cluster in Streaming Mappings

Ephemeral Cluster in Streaming Mappings

To reduce the cost of resources used to run a cluster, you can run streaming mappings on ephemeral cluster. Create cluster workflow to create ephemeral cluster and delete the cluster at the end of a certain processing period to free up the resources. When the cluster is deleted the information is stored so that it can be used when the cluster starts again.
To resume data process from the point in which a cluster is deleted, you can run streaming mappings on ephemeral cluster by specifying an external storage and a checkpoint directory.
Amazon S3, Microsoft Azure Data Lake Storage Gen1, and Microsoft Azure Data Lake Storage Gen2 can be specified as the external storage in the
State Store Connection
property.
You must also specify a checkpoint directory in the
Checkpoint Directory
property. The checkpoint details will be available on the external storage.