Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for HDFS
  3. PowerExchange for HDFS Configuration
  4. HDFS Connections
  5. HDFS Data Objects
  6. HDFS Data Extraction
  7. HDFS Data Load
  8. HDFS Mappings
  9. Appendix A: Data Type Reference

PowerExchange for HDFS User Guide

PowerExchange for HDFS User Guide

Compression and Decompression for Flat File Sources and Targets

Compression and Decompression for Flat File Sources and Targets

File compression can increase data transfer rates and reduce space for data storage.
You can read and write compressed flat files, specify compression formats, and decompress files. You can compress and decompress files in compression formats such as Bzip2 and Lzo, or specify a custom compression format.
You can specify a file or a directory of files. When the Data Integration Service reads from a directory, it reads the files of the specified format only and ignores files of other formats.
For information about how Hadoop processes compressed and uncompressed files, see the Hadoop documentation.
The following table describes the compression options:
Compression Options
Description
None
The file is not compressed.
Auto
The Data Integration Service detects the compression format of the file based on the file extension.
Gzip
The GNU zip compression format that uses the DEFLATE algorithm.
Bzip2
The Bzip2 compression format that uses the Burrows–Wheeler algorithm.
Lzo
The Lzo compression format that uses the Lempel-Ziv-Oberhumer algorithm.
The JAR files for LZO compression are not available with the default Hadoop installation. You must place the JAR files for the LZO compression format in the
lib
folder of the distribution directory and verify the distribution directory properties.
Custom
Custom compression format. If you select this option, you must specify the fully qualified class name implementing the Hadoop
CompressionCodec
interface in the
Compression Codec
field.

0 COMMENTS

We’d like to hear from you!