Table of Contents

Search

  1. Preface
  2. Advanced clusters
  3. Setting up AWS
  4. Setting up Google Cloud
  5. Setting up Microsoft Azure
  6. Setting up a self-service cluster
  7. Setting up a local cluster
  8. Advanced configurations
  9. Troubleshooting
  10. Appendix A: Command reference

Advanced Clusters

Advanced Clusters

Running a job

Running a job

To run a job, the Secure Agent and the worker nodes access sources and targets, as well as the staging and log locations. The worker nodes and the Azure disks auto-scale according to resource requirements.
The following image shows the process that the Secure Agent and worker nodes use to run the job:
The following steps describe the process that the Secure Agent and worker nodes use to run the job:
  1. The worker nodes use the connection properties to access source and target data.
    The connection properties access the data either using a storage account key or a managed identity. To use a managed identity, the identity must be assigned to the Secure Agent, and the agent role must have permissions to detect all user-assigned managed identities that are assigned to the Secure Agent machine, and be able to assign the identities to all cluster nodes.
  2. The Secure Agent authenticates with the managed identity to store job dependencies in the staging location.
  3. The worker nodes get job dependencies and stage temporary data in the staging location using the storage account key that the Secure Agent fetched through the managed identity. The Secure Agent also passes the key to the Spark job so that the Spark driver and Spark executors can use the same key to access the staging location.
  4. The worker nodes and the Azure disks auto-scale using the service principal.
  5. The worker nodes store logs in the log location after fetching the storage account key through the managed identity.
  6. The Secure Agent authenticates with the managed identity to upload the agent job log to the log location.

0 COMMENTS

We’d like to hear from you!