Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Using a JSON File to Configure Cluster Creation Properties

Using a JSON File to Configure Cluster Creation Properties

You can use a JSON file to pass Databricks parameters to the cluster creation workflow. Because the JSON file is easy for you to edit, you might want to use this method when your requirements for the ephemeral cluster change frequently.
When you use the JSON file to configure cluster creation properties, you must also configure environment variables in the Create Cluster task. The task passes the environment variables to the Data Integration Service, and the Data Integration Service reads the properties from the JSON file.
You can include any key-value pair if you want to pass custom parameters to the cluster creation process.
The JSON file requires three properties: spark_version, node_type_id, and either num_workers or autoscale. The following table describes the mandatory parameters:
Property
Description
spark_version
The version of Spark to use in the cluster. Required.
node_type_id
The node type and size, expressed as one of the node types supported on AWS or Azure. Required.
num_workers
The number of worker nodes in the cluster.
Required if you do not use the autoscale parameter.
autoscale
Use autoscale=true to create a cluster that uses autoscale.
Required if you do not use the num_workers parameter.
You can find all of the supported parameters in the Databricks documentation.