How to Develop a Mapping to Process Hierarchical Data
How to Develop a Mapping to Process Hierarchical Data
Develop a mapping with complex ports, operators, and functions to process hierarchical data on the Spark or Databricks Spark engine.
The tasks and the order in which you perform the tasks to develop the mapping depend on the mapping scenario.
The following is an outline of the high-level tasks to develop and run a mapping to read, write, and process hierarchical data in complex files.
Create a connection.
Create a connection to access data in complex files based on the file storage.
Create a data object.
Create a data object to represent the complex files as sources or targets.
Configure the data object properties.
In the read and write operations, enable the column file properties to project columns in the complex files as complex data types.
Create a mapping and add mapping objects.
Create a mapping, and add Read and Write transformations.
To read from and write to a complex file, add Read and Write transformations based on the data object read and write operations.
To write to an Avro or Parquet file, you can also create a complex file target from an existing transformation in the mapping.
Based on the mapping logic, add other transformations that are supported on the run-time engine.
Generate struct data.
Based on the mapping scenario, use one of the hierarchical conversion wizards to generate struct data. You can also perform the following steps manually:
Create or import complex data type definitions for struct ports.
Create or import complex data type definitions that represent the schema of the struct data.
The complex data type definitions are stored in the type definition library, which is a Model repository object. The default name of the type definition library is m_Type_Definition_Library.
If a mapping uses one or more mapplets, rename the type definition libraries in the mapping and the mapplets to ensure that the names are unique.
Create and configure struct ports in transformations.
Create ports in transformations and assign struct data type.
Specify the type configuration for the struct ports.
You must reference a complex data type definition for the struct port.
Create expressions with complex functions to generate struct data.
Modify struct data.
You can convert struct data to relational or hierarchical data. If the struct data contains elements of primitive data types, you can extract the elements as relational data. If the struct data contains elements of complex data types, you can extract the elements as hierarchical data. Based on the mapping scenario, use one of the hierarchical conversion wizards to modify struct data. You can also perform the following steps manually:
Create output ports with port properties that match the element of the struct data that you want to extract.
Create expressions with complex operators or complex functions to modify the struct data.
Generate array data.
Create ports in transformations and assign array data type.
Specify the type configuration for the array ports.
Create expressions with complex functions to generate array data.
Modify array data.
You can convert array data to relational or hierarchical data. If the array data contains elements of primitive data types, you can extract the elements as relational data. If the array data contains elements of complex data types, you can extract the elements as hierarchical data. Based on the mapping scenario, use one of the hierarchical conversion wizards to modify array data. You can also perform the following steps manually:
Create output ports with port properties that match the element of the array data that you want to extract.
Create expressions with complex operators or complex functions to modify the array data.
Generate map data.
Create ports in transformations and assign map data type.
Specify the type configuration for the map ports.
Create expressions with complex functions to generate map data.
Modify map data.
You can convert map data to relational or hierarchical data. If the map data contains elements of primitive data types, you can extract the elements as relational data. If the map data contains elements of complex data types, you can extract the elements as hierarchical data. Based on the mapping scenario, use one of the hierarchical conversion wizards to modify map data. You can also perform the following steps manually:
Create output ports with port properties that match the element of the map data that you want to extract.
Create expressions with complex operators or complex functions to modify the map data.
Configure the transformations.
Link the ports and configure the transformation properties based on the mapping logic.
Configure the mapping properties.
Configure the following mapping run-time properties:
To run on the Spark engine, choose the Spark validation environment and Hadoop as the execution environment.
To run on the Databricks Spark engine, choose the Databricks Spark validation environment and Hadoop as the execution environment.
Validate and run the mapping.
Validate the mapping to identify and correct any errors.
Optionally, view the engine execution plan to debug the logic.