Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Big Data Management User Guide

Big Data Management User Guide

Complex File Sources

Complex File Sources

A mapping that runs in the Hadoop environment can process complex files.
You can read files from the local file system or from HDFS. To read large volumes of data, you can connect a complex file source to read data from a directory of files that have the same format and properties. You can read compressed binary files.
A mapping that runs on the Blaze engine or the Hive engine can contain a Data Processor transformation. You can include a complex file data object without a Data Processor transformation to read complex files that are flat files. If the complex file is a hierarchical file, you must connect the complex file data object to a Data Processor transformation.
A mapping that runs on the Spark engine can process hierarchical data through complex data types. Use a complex file data object that represents the complex files in the Hadoop Distributed File System. If the complex file contains hierarchical data, you must enable the read operation to project columns as complex data types.
The following table shows the complex files that a mapping can process in the Hadoop environment:
File Type
Format
Blaze Engine
Spark Engine
Hive Engine
Avro
Flat
Supported
Supported
Supported
Avro
Hierarchical
Supported*
Supported**
Supported*
JSON
Flat
Supported*
Supported
Supported*
JSON
Hierarchical
Supported*
Supported**
Supported*
ORC
Flat
Not supported
Supported
Not supported
ORC
Hierarchical
Not supported
Not supported
Not supported
Parquet
Flat
Supported
Supported
Supported
Parquet
Hierarchical
Supported*
Supported**
Supported*
XML
Flat
Supported*
Not supported
Supported*
XML
Hierarchical
Supported*
Not supported
Supported*
* The complex file data object must be connected to a Data Processor transformation.
** The complex file read operation must be enabled to project columns as complex data type.

0 COMMENTS

We’d like to hear from you!