How to Develop a Mapping to Process Hierarchical Data
How to Develop a Mapping to Process Hierarchical Data
Develop a mapping with complex ports, operators, and functions to process hierarchical data on the Spark engine.
The tasks and the order in which you perform the tasks to develop the mapping depend on the mapping scenario.
The following list outlines the high-level tasks to develop and run a mapping to read, write, and process hierarchical data in complex files.
Create an HDFS connection.
Create a Hadoop Distributed File System (HDFS) connection to access data in complex files that are stored in the HDFS. You can create and manage an HDFS connection in the Administrator tool or the Developer tool.
Create a complex file data object.
Create a complex file data object to represent the complex files in the HDFS as sources or targets.
The Developer tool creates the read and write operations when you create the complex file data object.
Configure the complex file data object properties.
In the read and write operations, enable the column file properties to project columns in the complex files as complex data types.
Create a mapping and add mapping objects.
Create a mapping, and add Read and Write transformations.
To read from and write to a complex file, add Read and Write transformations based on the complex file data object.
To write to an Avro or Parquet file, you can also create a complex file target from an existing transformation in the mapping.
Based on the mapping logic, add other transformations that are supported on the Spark engine.
Generate struct data.
Based on the mapping scenario, use one of the hierarchical conversion wizards to generate struct data. You can also perform the following steps manually:
Create or import complex data type definitions for struct ports.
Create or import complex data type definitions that represent the schema of the struct data.
The complex data type definitions are stored in the type definition library, which is a Model repository object. The default name of the type definition library is Type_Definition_Library.
If a mapping uses one or more mapplets, rename the type definition libraries in the mapping and the mapplets to ensure that the names are unique.
Create and configure struct ports in transformations.
Create ports in transformations and assign struct complex data type.
Specify the type configuration for the struct ports.
You must reference a complex data type definition for the struct port.
Create expressions with complex functions to generate struct data.
Modify struct data.
You can convert struct data to relational or hierarchical data. If the struct data contains elements of primitive data types, you can extract the elements as relational data. If the struct data contains elements of complex data types, you can extract the elements as hierarchical data. Based on the mapping scenario, use one of the hierarchical conversion wizards to modify struct data. You can also perform the following steps manually.
Create output ports with port properties that match the element of the struct data that you want to extract.
Create expressions with complex operators or complex functions to modify the struct data.
Generate array data.
Create ports in transformations and assign array complex data type.
Specify the type configuration for the array ports.
Create expressions with complex functions to generate array data.
Modify array data.
You can convert array data to relational or hierarchical data. If the array data contains elements of primitive data types, you can extract the elements as relational data. If the array data contains elements of complex data types, you can extract the elements as hierarchical data. Based on the mapping scenario, use one of the hierarchical conversion wizards to modify array data. You can also perform the following steps manually:
Create output ports with port properties that match the element of the array data that you want to extract.
Create expressions with complex operators or complex functions to modify the array data.
Configure the transformations.
Link the ports and configure the transformation properties based on the mapping logic.
Configure the mapping to run on the Spark engine.
Configure the following mapping run-time properties:
Select Hadoop as the validation environment and Spark as the engine.
Select Hadoop as the execution environment and select a Hadoop connection.
Validate and run the mapping on the Spark engine.
Validate the mapping to fix any errors.
Optionally, view the Spark engine execution plan to debug the logic.