The Spark and Databricks Spark engines can process mappings that use complex data types, such as array, struct, and map. With complex data types, the run-time engine directly reads, processes, and writes hierarchical data in complex files.
The Spark and Databricks Spark engines can process hierarchical data in Avro, JSON, ORC, and Parquet complex files. They use complex data types to represent the native data types for hierarchical data in complex files. For example, a hierarchical data of type record in an Avro file is represented as a struct data type by the run-time engine.
Hierarchical Data Processing Scenarios
You can develop mappings for the following hierarchical data processing scenarios:
To generate and modify hierarchical data.
To transform relational data to hierarchical data.
To transform hierarchical data to relational data.
To convert data from one complex file format to another. For example, read hierarchical data from an Avro source and write to a JSON target.
To generate struct target data after parsing hierarchical JSON or XML data midstream in a mapping.
Hierarchical Data Processing Configuration
You create a connection to access complex files and create a data object to represent data in the complex file. Then, configure the data object read and write operations to project columns as complex data types. To read and write hierarchical data in complex files, you create a mapping, add a Read transformation based on the read operation, and add a Write transformation based on the write operation. To process hierarchical data, configure the following objects and transformation properties in a mapping:
Complex ports. To pass hierarchical data in a mapping, create complex ports. You create complex ports by assigning complex data types to ports.
Complex data type definitions. To process hierarchical data of type struct, create or import complex data type definitions that represent the schema of struct data.
Type configuration. To define the properties of a complex port, specify or change the type configuration.
Complex operators and functions. To generate or modify hierarchical data, create expressions using complex operators and functions.
You can also use hierarchical conversion wizards to simplify some of the mapping development tasks.