Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in the Hadoop Environment
  5. Mapping Objects in the Hadoop Environment
  6. Processing Hierarchical Data on the Spark Engine
  7. Stateful Computing on the Spark Engine
  8. Monitoring Mappings in the Hadoop Environment
  9. Mappings in the Native Environment
  10. Profiles
  11. Native Environment Optimization
  12. Data Type Reference
  13. Complex File Data Object Properties
  14. Function Reference
  15. Parameter Reference

Advanced Properties

Advanced Properties

The Developer tool displays the advanced properties for complex file sources in the Output transformation in the
Read
view.
The following table describes the advanced properties that you configure for complex file sources:
Property
Description
File path
The location of the file or directory. If the path is a directory, all the files in the directory must have the same file format.
If the file or directory is in HDFS, enter the path without the node URI. For example,
/user/lib/testdir
specifies the location of a directory in HDFS. The path must not contain more than 512 characters.
If the file or directory is in the local system, enter the fully qualified path. For example,
/user/testdir
specifies the location of a directory in the local system.
The Data Integration Service ignores any subdirectories and their contents.
File Format
The file format. Select one of the following file formats:
  • Binary. Select Binary to read any file format.
  • Sequence. Select Sequence File Format for source files of a Hadoop-specific binary format that contain key and value pairs.
  • Custom Input. Select Input File Format to specify a custom input format. You must specify the class name implementing the
    InputFormat
    interface in the
    Input Format
    field.
Default is Binary.
Input Format
The class name for files of the input file format. If you select Input File Format in the
File Format
field, you must specify the fully qualified class name implementing the
InputFormat
interface.
To read files that use the Avro format, use the following input format:
com.informatica.avro.AvroToXML
To read files that use the Parquet format, use the following input format:
com.informatica.parquet.ParquetToXML
You can use any class derived from
org.apache.hadoop.mapreduce.InputFormat
.
Compression Format
Optional. The compression format for binary files. Select one of the following options:
  • None
  • Auto
  • DEFLATE
  • gzip
  • bzip2
  • Lzo
  • Snappy
  • Custom
Not applicable to Avro and Parquet formats.
Custom Compression Codec
Required for custom compression. Specify the fully qualified class name implementing the
CompressionCodec
interface.


Updated December 13, 2018