Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Big Data Management User Guide

Big Data Management User Guide

Processing Hierarchical Data on the Spark Engine Overview

Processing Hierarchical Data on the Spark Engine Overview

You can use complex data types, such as array, struct, and map, in mappings that run on the Spark engine. With complex data types, the Spark engine directly reads, processes, and writes hierarchical data in complex files.
The Spark engine can process hierarchical data in Avro, JSON, and Parquet complex files. The Spark engine uses complex data types to represent the native data types for hierarchical data in complex files. For example, a hierarchical data of type record in an Avro file is represented as a struct data type on the Spark engine.
You can develop mappings for the following hierarchical data processing scenarios:
  • To generate and modify hierarchical data.
  • To transform relational data to hierarchical data.
  • To transform hierarchical data to relational data.
  • To convert data from one complex file format to another. For example, read hierarchical data from an Avro source and write to a JSON target.
To read from and write to complex files, you create complex file data objects. Configure the read and write operations for the complex file data object to project columns as complex data types. Read and Write transformations based on these complex file data objects can read and write hierarchical data.
Configure the following objects and transformation properties in a mapping to process hierarchical data:
  • Complex ports. To pass hierarchical data in a mapping, create complex ports. You create complex ports by assigning complex data types to ports.
  • Complex data type definitions. To process hierarchical data of type struct, create or import complex data type definitions that represent the schema of struct data.
  • Type configuration. To define the properties of a complex port, specify or change the type configuration.
  • Complex operators and functions. To generate or modify hierarchical data, create expressions using complex operators and functions.
You can also use hierarchical conversion wizards to simplify some of the mapping development tasks.

0 COMMENTS

We’d like to hear from you!