You can use complex data types, such as array, struct, and map, in mappings that run on the Spark engine. With complex data types, the Spark engine directly reads, processes, and writes hierarchical data in complex files.
The Spark engine can process hierarchical data in Avro, JSON, and Parquet complex files. The Spark engine uses complex data types to represent the native data types for hierarchical data in complex files. For example, a hierarchical data of type record in an Avro file is represented as a struct data type on the Spark engine.
You can develop mappings for the following hierarchical data processing scenarios:
To generate and modify hierarchical data.
To transform relational data to hierarchical data.
To transform hierarchical data to relational data.
To convert data from one complex file format to another. For example, read hierarchical data from an Avro source and write to a JSON target.
To read from and write to complex files, you create complex file data objects. Configure the read and write operations for the complex file data object to project columns as complex data types. Read and Write transformations based on these complex file data objects can read and write hierarchical data.
Configure the following objects and transformation properties in a mapping to process hierarchical data:
Complex ports. To pass hierarchical data in a mapping, create complex ports. You create complex ports by assigning complex data types to ports.
Complex data type definitions. To process hierarchical data of type struct, create or import complex data type definitions that represent the schema of struct data.
Type configuration. To define the properties of a complex port, specify or change the type configuration.
Complex operators and functions. To generate or modify hierarchical data, create expressions using complex operators and functions.
You can also use hierarchical conversion wizards to simplify some of the mapping development tasks.