Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Managing Distribution Packages
  5. Appendix B: Connections Reference

Configure S3 and Redshift Authentication and Encryption on AWS

Configure S3 and Redshift Authentication and Encryption on AWS

If the Databricks cluster uses the AWS S3 service to access sources or targets stored in S3 buckets or the Redshift data warehouse, configuration depends on your choices for authentication and encryption.

Authentication Options

Configure the following authentication options:
Key Pair Authentication
If the data resources use AWS key pair authentication, add the access key and the secret key in the Spark tab of the Advanced Configuration section of the Databricks cluster configuration page. Separate each key value with a space.
The image below shows an example of the keys in the text entry pane of the Spark tab:
The image shows a text entry pane under the Spark tab. Under "Spark Config," two key pairs have been entered.
IAM Role Authentication
If the data resources use IAM role authentication, verify that the configuration meets one of the following requirements:
  • The S3 bucket belongs to the AWS account in which the Databricks cluster resides.
  • The S3 bucket belongs to a different AWS account than the one in which the Databricks cluster resides, and you enabled a cross-account policy to allow the cluster to access the bucket.
For more information about using IAM roles with a Databricks cluster, see the AWS documentation.

Encryption Options

Choose from the following types of encryption. Each can be combined with either of the two options for authentication.
Configure the following properties in the Spark configuration tab of the cluster:
Server-Side S3 Encryption (SSE-S3)
Configure the following property to enable SSE-S3 encryption:
spark.hadoop.fs.s3a.server-side-encryption-algorithm AES256
If you use both key pair authentication and SSE-S3 encryption, then add this property in the Spark configuration tab after the first two lines for key pair authentication. For example:
spark.hadoop.fs.s3n.awsAccessKeyld YYYYXXXX spark.hadoop.fs.s3n.awsSecretAccessKey XYYYYSSSSSSS spark.hadoop.fs.s3a.server-side-encryption-algorithm AES256
Server-Side Encryption with KMS (SSE-KMS)
Configure the following properties to enable SSE-KMS encryption:
spark.hadoop.fs.s3a.server-side-encryption-kms-master-key-id arn:aws:kms:us-west-XXXXX:key/XXXYYYYYYYYYYYYYYYYY spark.hadoop.fs.s3a.server-side-encryption-algorithm aws:kms spark.hadoop.fs.s3a.impl com.data bricks.s3a.S3AFileSystem
If you use both key pair authentication and SSE-KMS encryption, then add these properties in the Spark configuration tab after the first two lines for key pair authentication. For example:
spark.hadoop.fs.s3n.awsAccessKeyld YYYYXXXX spark.hadoop.fs.s3n.awsSecretAccessKey XYYYYSSSSSSS spark.hadoop.fs.s3a.server-side-encryption-kms-master-key-id arn:aws:kms:us-west-XXXXX:key/XXXYYYYYYYYYYYYYYYYY spark.hadoop.fs.s3a.server-side-encryption-algorithm aws:kms spark.hadoop.fs.s3a.impl com.data bricks.s3a.S3AFileSystem

0 COMMENTS

We’d like to hear from you!