Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for Amazon S3
  3. PowerExchange for Amazon S3 Configuration Overview
  4. Amazon S3 Connections
  5. PowerExchange for Amazon S3 Data Objects
  6. PowerExchange for Amazon S3 Mappings
  7. PowerExchange for Amazon S3 Lookups
  8. Appendix A: Amazon S3 Data Type Reference
  9. Appendix B: Troubleshooting

PowerExchange for Amazon S3 User Guide

PowerExchange for Amazon S3 User Guide

Data Compression in Amazon S3 Sources and Targets

Data Compression in Amazon S3 Sources and Targets

You can decompress the data when you read data from Amazon S3 or compress data when you write data to Amazon S3.
Data Compression is applicable when you run a mapping in the native environment or on the Spark and Databricks Spark engine.
Configure the compression format in the
Compression Format
option under the advanced properties for an Amazon S3 data object read and write operation. The source or target file in Amazon S3 contains the same extension that you select in the
Compression Format
option.
When you perform a read operation, the Data Integration Service decompresses the data and then sends the data to Amazon S3 bucket. When you perform a write operation, the Data Integration Service compresses the data.
The following table lists the compression formats for the support for various operations and file formats in the native environment or on the Spark and Databricks Spark engine:
Compression format
Read
Write
Avro File
JSON File
ORC File
Parquet File
None
Yes
Yes
Yes
No
Yes
Yes
Bzip2
No
No
No
Yes
No
No
Deflate
Yes
Yes
Yes
Yes
No
No
Gzip
Yes
Yes
No
Yes
No
Yes
Lzo
Yes
Yes
No
No
No
Yes
Snappy
Yes
Yes
Yes
Yes
Yes
Yes
Zlib
Yes
Yes
No
No
Yes
No
Reading from files that use deflate, snappy, and zlib compression formats is implicit. You must select
None
to read files that use deflate, snappy, and zlib compression formats. For example, to read a parquet file that uses snappy compression, select
None
.
You can compress and decompress a binary file that uses gzip compression.
You can compress or decompress a flat file that uses the none, deflate, gzip, snappy, and zlib compression formats when you run a mapping in the native environment. You can compress or decompress a flat file that use the none, gzip, bzip2, and lzo compression formats when you run a mapping on the Spark engine.
When you run a mapping on the Spark or Databricks Spark engine to write multiple Avro files of different compression formats, the Data Integration Service does not write the data to the target properly. You must ensure that you use the same compression format for all the Avro files.
In the native environment, when you create a mapping to read or write an ORC file and select Lzo as the compression format, the mapping fails.
To read a compressed file from Amazon S3 on the Spark engine, the compressed file must have specific extensions. If the extensions used to read the compressed file are not specific or not valid, the Data Integration Service does not process the file.
The following table describes the extensions that are appended based on the compression format that you use:
Compression Format
File Name Extension
Gzip
.GZ
Deflate
.deflate
Bzip2
.BZ2
Lzo
.LZO
Snappy
.snappy
Zlib
.zlib

0 COMMENTS

We’d like to hear from you!