Table of Contents

Search

  1. Preface
  2. Advanced clusters
  3. Setting up AWS
  4. Setting up Google Cloud
  5. Setting up Microsoft Azure
  6. Setting up a self-service cluster
  7. Setting up a local cluster
  8. Advanced configurations
  9. Troubleshooting
  10. Appendix A: Command reference

Advanced Clusters

Advanced Clusters

Microsoft Azure properties

Microsoft Azure properties

Create an
advanced configuration
to configure properties for an
advanced cluster
. The properties describe where you want to start the cluster on your cloud platform and the infrastructure that you want to use.
The basic properties describe the
advanced configuration
and define the cloud platform to host the
advanced cluster
. To configure the cluster, configure the platform, advanced, and runtime properties.

Basic configuration

The following table describes the basic properties:
Property
Description
Name
Name of the
advanced configuration
.
Description
Description of the
advanced configuration
.
Runtime Environment
Runtime environment to associate with the
advanced configuration
. The runtime environment can contain only one Secure Agent. A runtime environment cannot be associated with more than one configuration.
Cloud Platform
Cloud platform that hosts the cluster.
Select Microsoft Azure.
Private Cluster
Creates an
advanced cluster
in which cluster resources have only private IP addresses.
When you choose to create a private cluster, you must specify the VNet and subnet in the advanced properties.

Platform configuration

The following table describes the platform properties:
Property
Description
Region
Region in which to create the cluster. Use the drop-down menu to view the regions that you can use.
Master Instance Type
Instance type to host the master node. Use the drop-down menu to view the instance types that you can use.
The list of available instance types is filtered based on the minimum number of resources that the cluster requires.
Worker Instance Type
Instance type to host the worker nodes. Use the drop-down menu to view the instance types that you can use.
The instance types that you can use depend on your Azure account.
For information to verify that the instance type that you select from the drop-down menu is supported on your account, refer to the Microsoft Azure documentation.
Number of Worker Nodes
Number of worker nodes in the cluster. Specify the minimum and maximum number of worker nodes.
Enable Spot Instances
Indicates whether to use Spot Instances for worker nodes.
Spot Instance Price Ratio
Maximum percentage of On-Demand Instance price to pay for Spot Instances. Specify an integer value between 1 and 100.
Required if you enable Spot Instances. If you do not enable Spot Instances, this property is ignored.
Enable High Availability
Indicates whether the cluster is highly available. You can enable high availability only if the region has availability zones 1, 2, and 3. One master node is created in each availability zone.
Availability Zones
List of availability zones where cluster nodes are created. The list of availability zones is populated automatically based on the region.
If the region has availability zones 1, 2, and 3, worker nodes are created across the zones.
Azure Disk Size
Size of the Azure disk to attach to a worker node for temporary storage during data processing. The disk size scales between the minimum and maximum based on job requirements. The range must be between 80 GB and 16 TB.
By default, the minimum and maximum disk sizes are 100 GB.
When the disk size scales down, the jobs that are currently running on the cluster might take longer to complete.
Cluster Shutdown
Cluster shutdown method. You can select one of the following cluster shutdown methods:
  • Smart shutdown. The Secure Agent stops the cluster when no job is expected during the defined idle timeout, based on historical data.
  • Idle timeout. The Secure Agent stops the cluster after the amount of idle time that you define.
Mapping
Task Timeout
Amount of time to wait for a
mapping
task to complete before it is terminated. By default, a
mapping
task does not have a timeout.
If you specify a timeout, a value of at least 10 minutes is recommended. The timeout begins when the
mapping
task is submitted to the Secure Agent.
Resource Group (Storage)
Storage resource group that holds the staging and log storage accounts.
The resource group can be a maximum of 90 characters.
If you specify an initialization script path, the storage account that holds the init script must be part of the same resource group.
Staging Location
Location on Azure Data Lake Storage Gen2 to store staging data that is generated when you run jobs.
Use the format:
abfs(s)://<file system>@<storage account>.dfs.core.windows.net/<folder path>
If encryption is enabled, specify the ABFSS protocol. Otherwise, specify the ABFS protocol.
Log Location
Location on Azure Data Lake Storage Gen2 to store logs that are generated when you run a job.
Use the format:
abfs(s)://<file system>@<storage account>.dfs.core.windows.net/<folder path>
If encryption is enabled, specify the ABFSS protocol. Otherwise, specify the ABFS protocol.

Advanced configuration

The following table describes the advanced properties:
Property
Description
Resource Group (Cluster)
Cluster resource group that holds cluster resources. If you do not specify a resource group, the agent creates a resource group to populate with cluster resources.
The resource group can be a maximum of 90 characters.
Service Principal Client ID
Service principal that the agent uses to manage Azure resources.
Key Vault
Key vault that stores the service principal credentials.
Secret Name
Name of the secret that stores the service principal credentials.
VNet
Azure VNet in which to create the cluster. Use the format:
resourceGroup/VNet
. The VNet must be in the specified region.
If you choose not to create a private cluster, you don't need to specify a VNet. In this case, the agent creates a VNet on your Azure account based on the region that you select.
A VNet is optional if you're using custom network security groups.
Subnet
Required when a VNet is specified. Subnet in which to create cluster nodes.
A subnet is optional if you're using custom network security groups.
IP Address Range
CIDR block that specifies the IP address range that the cluster can use.
The IP address range cannot overlap with the IP addresses of the subnets.
For example:
10.0.0.0/24
An IP address range is optional if you're using custom network security groups.
Initialization Script Path
Location on Azure Data Lake Storage Gen2 that stores the initialization script to run on each cluster node when the node is created.
Use the format:
abfs(s)://<file system>@<storage account>.dfs.core.windows.net/<folder path>/file.sh
The script must be a bash script and it can reference other init scripts in the same folder.
Master Security Group ID
Security group that defines the inbound and outbound security rules for master nodes in the cluster. The Secure Agent attaches this security group to all master nodes in the cluster.
Use the format:
<resource group name>/<NSG name>
The master security group can be a maximum of 155 characters.
If the
advanced configuration
includes the cluster resource group, and the NSG (network security group) belongs to the cluster resource group, you can use the network security group name as the value.
This security group replaces the default master security group created by
Data Integration
. For more information, see the How-To article "Create user defined security groups in Azure".
When you specify a master security group, the worker security group is required.
Worker Security Group ID
Security group that defines the inbound and outbound security rules for worker nodes in the cluster. The Secure Agent attaches this security group to all worker nodes in the cluster..
Use the format:
<resource group name>/<NSG name>
The worker security group can be a maximum of 155 characters.
If the
advanced configuration
includes the cluster resource group, and the NSG (network security group) belongs to the cluster resource group, you can use the network security group name as the value.
This security group replaces the default worker security group created by
Data Integration
. For more information, see the How-To article "Create user defined security groups in Azure".
When you specify a worker security group, the master security group is required.
Azure Tags
Tags on Microsoft Azure to apply to cluster nodes. Each tag has a key and a value.
You can list a maximum of 30 tags. The Secure Agent also assigns default tags to cloud resources. The default tags do not contribute to the limit of 30 tags.
Issues can occur when you override default tags. For more information, see Default tags for cloud resources.
Tags cannot include UTF-8 characters \u241e and \u241f that correspond to record and unit separators represented by ASCII control characters 30 and 31.

Runtime configuration

The following table describes the runtime properties:
Property
Description
Encrypt Data
Indicates whether temporary data on the cluster is encrypted.
Encrypting temporary data might slow down job performance.
Runtime Properties
Custom properties to customize the cluster and the jobs that run on the cluster.

0 COMMENTS

We’d like to hear from you!