Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Advanced Options

Advanced Options

Configure advanced options such as automatic termination and on-demand instances.
The following table describes advanced options that you can set for an AWS Databricks cluster:
Property
Description
Enable autoscaling local storage
Enables Databricks to monitor available disk space on worker nodes and automatically add additional EBS volumes.
EBS Volume Type
The type of volume that Databricks can add to cluster nodes.
Set this property when you enable autoscaling.
Number of Volumes
The number of volumes to provision for each instance. Enter a value between 0 and 10.
Set this property when you enable autoscaling and configure EBS volume type.
Size in GB
The size in gigabytes of each EBS volume.
Set this property when you enable autoscaling and configure EBS volume type.
Auto Termination
Enables automatic termination of the cluster.
Auto Terminate Time
Terminates the cluster after it is inactive for the specified number of minutes. Enter a value between 10 and 10,000. If you do not configure this, or if you set to 0, the cluster will not automatically terminate.
On-Demand/Spot Composition
The number of on-demand nodes. Enter a value between 0 and the number of worker nodes set in General Options. Any remaining worker nodes will be spot instances.
On-demand nodes are always available to use. Spot instances might terminate running jobs if they become unavailable. The driver node is always an on-demand node.
Set this property when you enable
Spot fall back to On-Demand
.
Default is 1.
Spot fall back to On-Demand
Enables on-demand instances to be used as a fallback.
If you are using spot instances and the market price for spot instances surges above your spot bid price, AWS terminates the spot instances. When you enable this property, on-demand instances are used in place of the spot instances when they terminate.
Availability Zone
The AWS cluster availability zone.
Default is us-east-1e.
Spot Bid Price
The maximum percent of the on-demand instance price that you bid on spot instances.
Spot instances are priced as a percentage of the on-demand price and are not always available.
If the market price for spot instances surges above the bid price set here and you do not enable
Spot fall back to On-Demand
, AWS terminates the spot instance.
Default is 100%.
IAM Role ARN
The instance profile ARN (Amazon Resource Name) that corresponds to the AWS IAM (Identity and Access Management) role. Copy the value from the AWS console in the following format:
arn:aws:iam::<account-id>:instance-profile/<role-name>
IAM roles allow you to access data from Databricks clusters. Add new IAM roles in the Administrator tool.
Spark Configurations
Performance configurations for the Databricks Spark engine. Enter key-value pairs in the following format: key1='value1' key2='value2'. You can also provide a path to a file that contains the key-value pairs.
Environment Variables
Environment variables that you can configure for the Databricks Spark engine. Enter key-value pairs in the following format: key1='value1' key2='value2'
Cluster Tags
Labels that you can assign to resources for tracking purposes. Enter key-value pairs in the following format: <key1>=<value1>,<key2>=<value2>. You can also provide a path to a local file that contains the key-value pairs.
Use the following format:
file:\\<file path>
SSH Public Key
The SSH public key to log into the driver and worker instances if you enable SSH. Copy the value from the Databricks console.
Cluster Log Conf
The location to deliver logs for long-term storage. If configured, the Databricks Spark engine will deliver the logs every five minutes.
Provide the path to DBFS.
Init Scripts
The location where you store init scripts. You can enter multiple destinations. The scripts are run sequentially in the order that you configure them. If you need to install additional Python libraries, specify the init script file location in this property.
Use the following format:
dbfs:/<path to init script>,dbfs:/<path to init script>

0 COMMENTS

We’d like to hear from you!