To reduce the cost of resources used to run a cluster, you can run streaming mappings on ephemeral cluster. Create cluster workflow to create ephemeral cluster and delete the cluster at the end of a certain processing period to free up the resources. When the cluster is deleted the information is stored so that it can be used when the cluster starts again.
To resume data process from the point in which a cluster is deleted, you can run streaming mappings on ephemeral cluster by specifying an external storage and a checkpoint directory.
Amazon S3, Microsoft Azure Data Lake Storage Gen1, and Microsoft Azure Data Lake Storage Gen2 can be specified as the external storage in the
State Store Connection
property.
You must also specify a checkpoint directory in the
Checkpoint Directory
property. The checkpoint details will be available on the external storage.