Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for HDFS
  3. PowerExchange for HDFS Configuration
  4. HDFS Connections
  5. HDFS Data Objects
  6. HDFS Data Extraction
  7. HDFS Data Load
  8. HDFS Mappings
  9. Appendix A: Data Type Reference

PowerExchange for HDFS User Guide

PowerExchange for HDFS User Guide

Advanced Properties

Advanced Properties

The Developer tool displays the advanced properties for complex file sources in the Output transformation in the
Read
view.
The following table describes the advanced properties that you configure for complex file sources:
Property
Description
Allow Wildcard Characters
Indicates whether you want to use wildcard characters for the source directory name or the source file name.
If you select this option, you can use wildcard characters ? and * for the source directory name or the source file name in the
File path
field.
The question mark character (?) allows one occurrence of any character. The asterisk character (*) allows zero or more than one occurrence of any character.
This option is applicable when you run a mapping in the native environment or on the Spark engine.
File Format
The file format. Select one of the following file formats:
  • Binary. Select Binary to read any file format.
  • Sequence. Select Sequence File Format for source files of a Hadoop-specific binary format that contain key and value pairs.
  • Custom Input. Select Input File Format to specify a custom input format. You must specify the class name implementing the
    InputFormat
    interface in the
    Input Format
    field.
  • Assign Parameter. Select Assign Parameter to parameterize the file format.
Default is Binary.
Input Format
The class name for files of the input file format. If you select
Input File Format
in the
File Format
field, you must specify the fully qualified class name implementing the
InputFormat
interface.
To read files that use the Avro format, use the following input format:
com.informatica.avro.AvroToXML
To read files that use the Parquet format, use the following input format:
com.informatica.parquet.ParquetToXML
You can use any class derived from
org.apache.hadoop.mapreduce.InputFormat
.
Input Format Parameters
Parameters for the input format class. Enter name-value pairs separated with a semicolon. Enclose the parameter name and value within double quotes.
For example, use the following syntax:
"param1"="value1";"param2"="value2"
Compression Format
Optional. The compression format for binary files. Select one of the following options:
  • None
  • Auto
  • DEFLATE
  • gzip
  • bzip2
  • Lzo
  • Snappy
  • Custom
Custom Compression Codec
Required for custom compression. Specify the fully qualified class name implementing the
CompressionCodec
interface.
File path
The location of the file or directory. If the path is a directory, all the files in the directory must have the same file format.
If the file or directory is in HDFS, enter the path without the node URI. For example,
/user/lib/testdir
specifies the location of a directory in HDFS. The path must not contain more than 512 characters.
If the file or directory is in the local system, enter the fully qualified path. For example,
/user/testdir
specifies the location of a directory in the local system.
The Data Integration Service ignores any subdirectories and their contents.
If you select the
Allow Wildcard Characters
option, you can use wildcard characters ? and * for the source directory name or the source file name.

0 COMMENTS

We’d like to hear from you!