Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for Microsoft Azure Data Lake Storage Gen2
  3. PowerExchange for Microsoft Azure Data Lake Storage Gen2 Configuration
  4. Microsoft Azure Data Lake Storage Gen2 Connections
  5. PowerExchange for Microsoft Azure Data Lake Storage Gen2 Data Objects
  6. Microsoft Azure Data Lake Storage Gen2 Mappings
  7. Appendix A: Microsoft Azure Data Lake Storage Gen2 Datatype Reference

PowerExchange for Microsoft Azure Data Lake Storage Gen2 User Guide

PowerExchange for Microsoft Azure Data Lake Storage Gen2 User Guide

Configure Azure Databricks Cluster to Access Microsoft Azure Data Lake Storage Gen2

Configure Azure Databricks Cluster to Access Microsoft Azure Data Lake Storage Gen2

Set the following Hadoop credential configuration options under
Spark Config
in your Databricks cluster configuration when you use service principal authentication to access the
Microsoft Azure Data Lake Storage Gen2
account:
spark.hadoop.fs.azure.account.auth.type OAuth spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider spark.hadoop.fs.azure.account.oauth2.client.id <your-service-client-id> spark.hadoop.fs.azure.account.oauth2.client.secret <your-service-client-secret-key> spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<directory-ID-of-Azure-AD>/oauth2/token
Set the following Hadoop credential configuration option when you use shared key authentication to access the
Microsoft Azure Data Lake Storage Gen2
account:
spark.hadoop.fs.azure.account.key.<account-name>.dfs.core.windows.net <account-key>
When you run a Microsoft Azure Data Lake Storage Gen2 mapping on the Databricks Spark engine to read partitioned files of Avro, Parquet, or ORC format and write the files to the target created at runtime, the Databricks Spark engine generates
_success
,
_committed
,
_started
, or Parquet summary files along with the partitioned files. If you want to import the created target, the import might fail due to these files.
To disable the files, configure the following properties under
Spark Config
in your Databricks cluster configuration:
  • To disable commit or start files, configure the following property:
    spark.sql.sources.commitProtocolClass org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
  • To disable Parquet summary files, configure the following property:
    parquet.enable.summary-metadata false
  • To disable success files, configure the following property:
    mapreduce.fileoutputcommitter.marksuccessfuljobs false

0 COMMENTS

We’d like to hear from you!