PowerExchange for Microsoft Azure Data Lake Storage Gen2 User Guide

10.5.3
- 10.5.7
- 10.5.6
- 10.5.4
- 10.5.2
- 10.5.1
- 10.5
- 10.4.1
- 10.4.0

Back Next

Configure Azure Databricks Cluster to Access Microsoft Azure Data Lake Storage Gen2

Set the following Hadoop credential configuration options under

Spark Config

in your Databricks cluster configuration when you use service principal authentication to access the

Microsoft Azure Data Lake Storage Gen2

account:

spark.hadoop.fs.azure.account.auth.type OAuth
spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id <your-service-client-id>
spark.hadoop.fs.azure.account.oauth2.client.secret <your-service-client-secret-key>
spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<directory-ID-of-Azure-AD>/oauth2/token

Set the following Hadoop credential configuration option when you use shared key authentication to access the

Microsoft Azure Data Lake Storage Gen2

account:

spark.hadoop.fs.azure.account.key.<account-name>.dfs.core.windows.net <account-key>

When you run a Microsoft Azure Data Lake Storage Gen2 mapping on the Databricks Spark engine to read partitioned files of Avro, Parquet, or ORC format and write the files to the target created at runtime, the Databricks Spark engine generates

_success

_committed

_started

, or Parquet summary files along with the partitioned files. If you want to import the created target, the import might fail due to these files.

To disable the files, configure the following properties under

Spark Config

in your Databricks cluster configuration:

To disable commit or start files, configure the following property:

spark.sql.sources.commitProtocolClass org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol

To disable Parquet summary files, configure the following property:

parquet.enable.summary-metadata false

To disable success files, configure the following property:

mapreduce.fileoutputcommitter.marksuccessfuljobs false

Rename Saved Search

Table of Contents

PowerExchange for Microsoft Azure Data Lake Storage Gen2 User Guide

PowerExchange for Microsoft Azure Data Lake Storage Gen2 User Guide

Configure Azure Databricks Cluster to Access Microsoft Azure Data Lake Storage Gen2

Configure Azure Databricks Cluster to Access Microsoft Azure Data Lake Storage Gen2