Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for HDFS
  3. PowerExchange for HDFS Configuration
  4. HDFS Connections
  5. HDFS Data Objects
  6. HDFS Data Extraction
  7. HDFS Data Load
  8. HDFS Mappings
  9. Appendix A: Data Type Reference

PowerExchange for HDFS User Guide

PowerExchange for HDFS User Guide

Advanced Properties

Advanced Properties

The Developer tool displays the advanced properties for complex file targets in the Input transformation in the
Write
view.
The following table describes the advanced properties that you configure for complex file targets:
Property
Description
File Directory
The directory location of the complex file target.
If the directory is in HDFS, enter the path without the node URI. For example,
/user/lib/testdir
specifies the location of a directory in HDFS. The path must not contain more than 512 characters.
If the directory is in the local system, enter the fully qualified path. For example,
/user/testdir
specifies the location of a directory in the local system.
The Data Integration Service ignores any subdirectories and their contents.
File Name
The name of the output file. PowerExchange for HDFS appends the file name with a unique identifier before it writes the file to HDFS.
In spark mode PowerExchange for HDFS appends the file name with .avro extension.
Overwrite Target
Indicates whether the Data Integration Service must first delete the target data before writing data.
If you select the
Overwrite Target
option, the Data Integration Service deletes the target data before writing data. If you do not select this option, the Data Integration Service creates a new file in the target and writes the data to the file.
This option is applicable when you run a mapping in the native environment or on the Spark engine to write data to complex files.
File Format
The file format. Select one of the following file formats:
  • Binary. Select Binary to write any file format.
  • Sequence. Select Sequence File Format for target files of a Hadoop-specific binary format that contain key and value pairs.
  • Custom Output. Select Output Format to specify a custom output format. You must specify the class name implementing the
    OutputFormat
    interface in the
    Output Format
    field.
  • Assign Parameter. Select Assign Parameter to parameterize the file format.
Default is Binary.
Output Format
The class name for files of the output format. If you select Output Format in the
File Format
field, you must specify the fully qualified class name implementing the
OutputFormat
interface.
Output Key Class
The class name for the output key. If you select Output Format in the
File Format
field, you must specify the fully qualified class name for the output key.
You can specify one of the following output key classes:
  • BytesWritable
  • Text
  • LongWritable
  • IntWritable
PowerExchange for HDFS generates the key in ascending order.
Output Value Class
The class name for the output value. If you select Output Format in the
File Format
field, you must specify the fully qualified class name for the output value.
You can use any custom writable class that Hadoop supports. Determine the output value class based on the type of data that you want to write.
When you use custom output formats, the value part of the data that is streamed to the complex file data object write operation must be in a serialized form.
Compression Format
Optional. The compression format for binary files. Select one of the following options:
  • None
  • Auto
  • DEFLATE
  • gzip
  • bzip2
  • LZO
  • Snappy
  • Custom
  • Assign Parameter...
Custom Compression Codec
Required for custom compression. Specify the fully qualified class name implementing the
CompressionCodec
interface.
Sequence File Compression Type
Optional. The compression format for sequence files. Select one of the following options:
  • None
  • Record
  • Block
  • Assign Parameter...

0 COMMENTS

We’d like to hear from you!