Running a job with direct access to Amazon data sources
Running a job with direct access to Amazon data sources
To run a job that uses a connector with direct access to Amazon data sources, the cluster accesses Amazon resources using role-based security or credential-based security.
The following image shows the process that the Secure Agent and cluster nodes use to run the job:
The following steps describe the process that the Secure Agent and cluster nodes use to run the job:
The Secure Agent assumes the cluster operator role to store job dependencies in the staging location.
The worker nodes use the connection-level role, the worker role, or connection-level AWS credentials to access source data based on the job security type. If you use role-based security, the worker nodes use the connection-level role or the worker role. If you use credential-based security, the worker nodes use the connection-level credentials. The authentication configured at the connection level takes precedence.
The worker nodes use the connection-level role, worker role, or connection-level credentials to access the staging location to get job dependencies and stage temporary data.
The worker nodes use the worker role to auto-scale EBS volumes if the job requires more storage space.
The master node uses the master role to scale cluster nodes based on resource requirements.
The worker nodes use the worker role to store logs in the log location.
The master node uses the master role to store logs in the log location.
The Secure Agent uses the Secure Agent role to upload the agent job log to the log location.
Security types
Worker nodes access Amazon resources in the following ways based on the security type:
Credential-based security
If you set up credential-based security, worker nodes use connection-level AWS credentials to access Amazon resources, including Amazon data sources and the staging location. The worker nodes use the worker role to access the log location.
Credential-based security overrides role-based security. If any source or target in the job provides AWS credentials, the worker nodes reuse the credentials to access the staging location. For example, if a job uses a JDBC V2 source and an Amazon S3 V2 target, the worker nodes use the AWS credentials that access the S3 target to access the staging location for the job.
Role-based security
If you set up role-based security, worker nodes use either the connection-level role or the worker role to access Amazon resources, including Amazon data sources, the staging location, and the log location. The role configured at the connection level takes precedence over the worker role.
If you use default master and worker roles, the policies that are attached to the Secure Agent role are passed to the worker role. The policies that are passed to the worker role can grant the worker role access to Amazon resources.