Table of Contents

Search

  1. Preface
  2. Advanced clusters
  3. Setting up AWS
  4. Setting up Google Cloud
  5. Setting up Microsoft Azure
  6. Setting up a self-service cluster
  7. Setting up a local cluster
  8. Advanced configurations
  9. Troubleshooting
  10. Appendix A: Command reference

Advanced Clusters

Advanced Clusters

Step 5. Create storage accounts for cluster files

Step 5. Create storage accounts for cluster files

You can store data using Azure Data Lake Storage Gen2.
In Azure, create the following storage accounts using a hierarchical namespace:
  • A storage account with the following locations:
    • A location that the cluster will use to store staging files at run time
    • A location that the cluster will use to store log files for the
      advanced jobs
      that run on the cluster
  • Optionally, a storage account
    where you can store initialization scripts that cluster nodes will run to install additional software on the cluster
Then, add these storage accounts to a resource group named
storage_resource_group
.
The staging location stores temporary data, such as artifacts that the cluster distributes across cluster nodes and data that you preview in a mapping. Because an error might prevent a mapping from clearing preview data in the staging location, make sure that the users who have access to the staging location are permitted to view source data.
If you create any initialization scripts, add the scripts to the appropriate location.

0 COMMENTS

We’d like to hear from you!