Data Engineering Integration
- Data Engineering Integration 10.5.4
- All Products
Property
| Description
|
---|---|
Enable autoscaling local storage
| Enables Databricks to monitor available disk space on worker nodes and automatically add additional EBS volumes.
|
EBS Volume Type
| The type of volume that Databricks can add to cluster nodes.
Set this property when you enable autoscaling.
|
Number of Volumes
| The number of volumes to provision for each instance. Enter a value between 0 and 10.
Set this property when you enable autoscaling and configure EBS volume type.
|
Size in GB
| The size in gigabytes of each EBS volume.
Set this property when you enable autoscaling and configure EBS volume type.
|
Auto Termination
| Enables automatic termination of the cluster.
|
Auto Terminate Time
| Terminates the cluster after it is inactive for the specified number of minutes. Enter a value between 10 and 10,000. If you do not configure this, or if you set to 0, the cluster will not automatically terminate.
|
On-Demand/Spot Composition
| The number of on-demand nodes. Enter a value between 0 and the number of worker nodes set in General Options. Any remaining worker nodes will be spot instances.
On-demand nodes are always available to use. Spot instances might terminate running jobs if they become unavailable. The driver node is always an on-demand node.
Set this property when you enable
Spot fall back to On-Demand .
Default is 1.
|
Spot fall back to On-Demand
| Enables on-demand instances to be used as a fallback.
If you are using spot instances and the market price for spot instances surges above your spot bid price, AWS terminates the spot instances. When you enable this property, on-demand instances are used in place of the spot instances when they terminate.
|
Availability Zone
| The AWS cluster availability zone.
Default is us-east-1e.
|
Spot Bid Price
| The maximum percent of the on-demand instance price that you bid on spot instances.
Spot instances are priced as a percentage of the on-demand price and are not always available.
If the market price for spot instances surges above the bid price set here and you do not enable
Spot fall back to On-Demand , AWS terminates the spot instance.
Default is 100%.
|
IAM Role ARN
| The instance profile ARN (Amazon Resource Name) that corresponds to the AWS IAM (Identity and Access Management) role. Copy the value from the AWS console in the following format:
IAM roles allow you to access data from Databricks clusters. Add new IAM roles in the Administrator tool.
|
Spark Configurations
| Performance configurations for the Databricks Spark engine. Enter key-value pairs in the following format: key1='value1' key2='value2'. You can also provide a path to a file that contains the key-value pairs.
|
Environment Variables
| Environment variables that you can configure for the Databricks Spark engine. Enter key-value pairs in the following format: key1='value1' key2='value2'
Enter the userJson and pathToFile properties in the environment variables when you use a JSON file to configure Create Cluster task properties. See
Create the JSON File.
|
Cluster Tags
| Labels that you can assign to resources for tracking purposes. Enter key-value pairs in the following format: <key1>=<value1>,<key2>=<value2>. You can also provide a path to a local file that contains the key-value pairs.
Use the following format:
|
SSH Public Key
| The SSH public key to log into the driver and worker instances if you enable SSH. Copy the value from the Databricks console.
|
Cluster Log Conf
| The location to deliver logs for long-term storage. If configured, the Databricks Spark engine will deliver the logs every five minutes.
Provide the path to DBFS.
|
Init Scripts
| The location where you store init scripts. You can enter multiple destinations. The scripts are run sequentially in the order that you configure them. If you need to install additional Python libraries, specify the init script file location in this property.
Use the following format:
|