Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for HDFS
  3. PowerExchange for HDFS Configuration
  4. HDFS Connections
  5. HDFS Data Objects
  6. HDFS Data Extraction
  7. HDFS Data Load
  8. HDFS Mappings
  9. Appendix A: Data Type Reference

PowerExchange for HDFS User Guide

PowerExchange for HDFS User Guide

Compression and Decompression for Complex File Sources and Targets

Compression and Decompression for Complex File Sources and Targets

You can read and write compressed complex files, specify compression formats, and decompress files. You can use compression formats such as Bzip2 and Lzo, or specify a custom compression format. The compressed files must be of the binary format.
You can compress sequence files at a record level or at a block level.
For information about how Hadoop processes compressed and uncompressed files, see the Hadoop documentation.
The following table describes the complex file compression formats for binary files:
Compression Options
Description
None
The file is not compressed.
Auto
The Data Integration Service detects the compression format of the file based on the file extension.
DEFLATE
The DEFLATE compression format that uses a combination of the LZ77 algorithm and Huffman coding.
Gzip
The GNU zip compression format that uses the DEFLATE algorithm.
Bzip2
The Bzip2 compression format that uses the Burrows–Wheeler algorithm.
Lzo
The Lzo compression format that uses the Lempel-Ziv-Oberhumer algorithm.
Snappy
The LZ77-type compression format with a fixed, byte-oriented encoding.
Default compression format is Snappy on the Spark engine.
Custom
Custom compression format. If you select this option, you must specify the fully qualified class name implementing the
CompressionCodec
interface in the
Custom Compression Codec
field.

0 COMMENTS

We’d like to hear from you!