How to Use a Midstream Mapping to Parse Hierarchical Data
How to Use a Midstream Mapping to Parse Hierarchical Data
Develop a midstream mapping to parse hierarchical data in a source string on the Spark engine.
The following high-level tasks describe how to develop and run a mapping that parses hierarchical data in a source string:
Create a connection.
Create a connection to access data in complex files based on the file storage.
Create data objects.
Create a data object to represent the source file with the source string that contains hierarchical data in JSON or XML format.
Create a data object to represent the target that will include the hierarchical data in a struct.
Configure the data objects' properties.
Create or import a complex data type definition.
Intelligent Structure Discovery
parses sample data and discovers the schema for hierarchical data in a source string. The following alternatives are available:
Create a complex data type definition using a representative sample file in
Informatica Intelligent Cloud Services
Intelligent Structure Discovery
.
Import an existing
Intelligent Structure Discovery
.amodel
complex data type definition.
Create a complex data type definition using a representative sample file in the Developer tool.
The complex data type definitions are stored in the type definition library that is a Model repository object. The default name of the type definition library is m_Type_Definition_Library.
You cannot manually create a complex data type definition for midstream mapping.
Create a mapping and add mapping objects.
Create a mapping, and add Read and Write transformations.
Create a Read transformation to read the hierarchical data from the source string.
Create a Write transformation to write the hierarchical data to a target struct.
Create an Expression transformation for the PARSE_JSON or PARSE_XML function.
Based on the mapping logic, add other transformations that are supported on the run-time engine.
Create and configure ports in transformations.
Create the Read ports including the string Type that contains the hierarchical data.
Create the Write ports including the struct Type that contains the parsed hierarchical data.
Create the Expression ports:
Configure the input string as input and output.
Configure the output struct as output. The Type Definition must reference the complex data type definition you created or imported. Configure the PARSE_JSON or PARSE_XML function for the expression.
Configure the transformations.
Link the ports and configure the transformation properties based on the mapping logic.
Configure the mapping properties.
Configure the mapping run-time properties: choose the Spark validation environment and Hadoop as the execution environment.
Validate and run the mapping.
Validate the mapping to identify and correct any errors.
Optionally, view the engine execution plan to debug the logic.