Table of Contents

Search

Data Compression in Amazon S3 Sources and Targets

Data Compression in Amazon S3 Sources and Targets

You can decompress the data when you read data from Amazon S3 or compress data when you write data to Amazon S3.
Configure the compression format in the
Compression Format
option under the advanced properties for an Amazon S3 data object read and write operation. The source or target file in Amazon S3 contains the same extension that you select in the
Compression Format
option. When you perform a read operation, the Data Integration Service decompresses the data and then sends the data to Amazon S3 bucket. When you perform a write operation, the Data Integration Service compresses the data.
Data Compression is applicable when you run a mapping in the native environment or on the Spark engine.
The following table lists the compression formats for the support for various operations and file formats in the native environment or on the Spark engine:
Compression format
Read
Write
Avro File
Parquet File
None
Yes
Yes
Yes
Yes
Deflate
No
Yes
Yes
No
Gzip
Yes
Yes
No
Yes
Bzip2
Yes
Yes
No
No
Lzo
Yes
Yes
No
Yes
Snappy
No
Yes
Yes
Yes
After you compress the Avro and Parquet files, you can read the files without using any compression format.
You can compress a flat file in the none and gzip compression format when you run a mapping in the native environment. You can compress a flat file in the none, gzip, bzip2, and lzo compression format when you run a mapping on the Spark engine.
To read a compressed file from Amazon S3 on the Spark engine, the compressed file must have specific extensions. If the extensions used to read the compressed file are not specific or not valid, the Data Integration Service does not process the file. The following table describes the extensions that are appended based on the compression format that you use:
Compression Format
File Name Extension
Gzip
.GZ
Deflate
.deflate
Bzip2
.BZ2
Lzo
.LZO
Snappy
.snappy


Updated July 30, 2020