You can store data using Azure Data Lake Storage Gen2.
In Azure, create the following storage accounts using a hierarchical namespace:
A storage account with the following locations:
A location that the cluster will use to store staging files at run time
A location that the cluster will use to store log files for the
advanced jobs
that run on the cluster
Optionally, a storage account
where you can store initialization scripts that cluster nodes will run to install additional software on the cluster
Then, add these storage accounts to a resource group named
storage_resource_group
.
The staging location stores temporary data, such as artifacts that the cluster distributes across cluster nodes and data that you preview in a mapping. Because an error might prevent a mapping from clearing preview data in the staging location, make sure that the users who have access to the staging location are permitted to view source data.
If you create any initialization scripts, add the scripts to the appropriate location.