Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for Microsoft Azure Blob Storage
  3. PowerExchange for Microsoft Azure Blob Storage Configuration
  4. Microsoft Azure Blob Storage Connections
  5. Microsoft Azure Blob Storage Data Objects
  6. Microsoft Azure Blob Storage Mappings
  7. Data Type Reference

PowerExchange for Microsoft Azure Blob Storage User Guide

PowerExchange for Microsoft Azure Blob Storage User Guide

Data Compression in Microsoft Azure Blob Storage Sources and Targets

Data Compression in Microsoft Azure Blob Storage Sources and Targets

You can decompress data when you read data from Microsoft Azure Blob Storage and compress the data when you write data to Microsoft Azure Blob Storage.
Configure the compression format in the
Compression Format
option under the advanced source and target properties.
For the Flat resource type, select only the Gzip compression format in the native environment. The following table lists the compression formats for Avro, JSON, and Parquet resource types for a read operation:
Compression format
Avro File
Flat File
JSON File
Parquet File
None
Yes
Yes
No
Yes
Deflate*
Yes
N/A
Yes
No
Gzip
No
Yes
Yes
Yes
Bzip2
N/A
N/A
Yes
N/A
Lzo
N/A
N/A
No
Yes
Snappy*
Yes
N/A
Yes
Yes
*Select None to read the Deflate and Snappy file formats.
The following table lists the compression formats for Avro, JSON, and Parquet resource types for a write operation:
Compression format
Avro File
Flat File
JSON File
Parquet File
None
Yes
Yes
No
Yes
Deflate
Yes
N/A
Yes
No
Gzip
No
Yes
Yes
Yes
Bzip2
N/A
N/A
Yes
N/A
Lzo
N/A
N/A
No
Yes
Snappy
Yes
N/A
Yes
Yes
To read a compressed file from Microsoft Azure Blob Storage, the compressed file must have specific extensions. If the extensions used to read the compressed file are not valid, the Integration Service does not process the file. The following table describes the extensions that are appended based on the compression format that you use:
Compression format
File Name Extension
Deflate
.deflate
Gzip
.GZ
Bzip2
.BZ2
Lzo
.LZO
Snappy
.snappy

Rules and guidelines for data compression

Consider the following guidelines when you read or write compressed files:
  • To read multiple compressed files from a container, the compressed files must have same schema.
  • You can read and write only primitive data types in the native environment.
  • You can read and write both primitive and hierarchical data types in the non-native environment.
  • You must enable the
    Compressed Newly Created Blob
    property to write a compressed file.
  • You cannot append a compressed file to a target.
  • To read and write Avro files compressed using Deflate to an Azure Blob Storage target, configure the following properties under Spark Config in your Databricks 5.1 cluster configuration:
    • spark.hadoop.avro.mapred.ignore.inputs.without.extension false
    • spark.sql.avro.compression.codec deflate
    • spark.sql.avro.deflate.level 5

0 COMMENTS

We’d like to hear from you!