Table of Contents

Search

  1. Preface
  2. Workflows
  3. Workflow Variables
  4. Workflow Parameters
  5. Cluster Tasks
  6. Command Task
  7. Human Task
  8. Mapping Task
  9. Notification Task
  10. Gateways
  11. Workflow Recovery
  12. Workflow Administration

Developer Workflow Guide

Developer Workflow Guide

Advanced Options

Advanced Options

Configure advanced options such as automatic termination and on-demand instances.
The following table describes advanced options that you can set for an AWS Databricks cluster:
Property
Description
Enable autoscaling local storage
Enables Databricks to monitor available disk space on worker nodes and automatically add additional EBS volumes.
EBS Volume Type
The type of volume that Databricks can add to cluster nodes.
Set this property when you enable autoscaling.
Number of Volumes
The number of volumes to provision for each instance. Enter a value between 0 and 10.
Set this property when you enable autoscaling and configure EBS volume type.
Size in GB
The size in gigabytes of each EBS volume.
Set this property when you enable autoscaling and configure EBS volume type.
Auto Termination
Enables automatic termination of the cluster.
Auto Terminate Time
Terminates the cluster after it is inactive for the specified number of minutes. Enter a value between 10 and 10,000. If you do not configure this, or if you set to 0, the cluster will not automatically terminate.
On-Demand/Spot Composition
The number of on-demand nodes. Enter a value between 0 and the number of worker nodes set in General Options. Any remaining worker nodes will be spot instances.
On-demand nodes are always available to use. Spot instances might terminate running jobs if they become unavailable. The driver node is always an on-demand node.
Set this property when you enable
Spot fall back to On-Demand
.
Default is 1.
Spot fall back to On-Demand
Enables on-demand instances to be used as a fallback.
If you are using spot instances and the market price for spot instances surges above your spot bid price, AWS terminates the spot instances. When you enable this property, on-demand instances are used in place of the spot instances when they terminate.
Availability Zone
The AWS cluster availability zone.
Default is us-east-1e.
Spot Bid Price
The maximum percent of the on-demand instance price that you bid on spot instances.
Spot instances are priced as a percentage of the on-demand price and are not always available.
If the market price for spot instances surges above the bid price set here and you do not enable
Spot fall back to On-Demand
, AWS terminates the spot instance.
Default is 100%.
IAM Role ARN
The instance profile ARN (Amazon Resource Name) that corresponds to the AWS IAM (Identity and Access Management) role. Copy the value from the AWS console in the following format:
arn:aws:iam::<account-id>:instance-profile/<role-name>
IAM roles allow you to access data from Databricks clusters. Add new IAM roles in the Administrator tool.
Spark Configurations
Performance configurations for the Databricks Spark engine. Enter key-value pairs in the following format: key1='value1' key2='value2'. You can also provide a path to a file that contains the key-value pairs.
Environment Variables
Environment variables that you can configure for the Databricks Spark engine. Enter key-value pairs in the following format: key1='value1' key2='value2'
Enter the userJson and pathToFile properties in the environment variables when you use a JSON file to configure Create Cluster task properties. See GUID-FC2C0376-B63B-4F3A-BB5F-AF46D5C4A537.
Cluster Tags
Labels that you can assign to resources for tracking purposes. Enter key-value pairs in the following format: <key1>=<value1>,<key2>=<value2>. You can also provide a path to a local file that contains the key-value pairs.
Use the following format:
file:\\<file path>
SSH Public Key
The SSH public key to log into the driver and worker instances if you enable SSH. Copy the value from the Databricks console.
Cluster Log Conf
The location to deliver logs for long-term storage. If configured, the Databricks Spark engine will deliver the logs every five minutes.
Provide the path to DBFS.
Init Scripts
The location where you store init scripts. You can enter multiple destinations. The scripts are run sequentially in the order that you configure them. If you need to install additional Python libraries, specify the init script file location in this property.
Use the following format:
dbfs:/<path to init script>,dbfs:/<path to init script>

0 COMMENTS

We’d like to hear from you!