Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

How to Use a Midstream Mapping to Parse Hierarchical Data

How to Use a Midstream Mapping to Parse Hierarchical Data

Develop a midstream mapping to parse hierarchical data in a source string on the Spark engine.
The following high-level tasks describe how to develop and run a mapping that parses hierarchical data in a source string:
Create a connection.
Create a connection to access data in complex files based on the file storage.
Create data objects.
  1. Create a data object to represent the source file with the source string that contains hierarchical data in JSON or XML format.
  2. Create a data object to represent the target that will include the hierarchical data in a struct.
  3. Configure the data objects' properties.
Create or import a complex data type definition.
Intelligent Structure Discovery
parses sample data and discovers the schema for hierarchical data in a source string. The following alternatives are available:
  • Create a complex data type definition using a representative sample file in
    Informatica Intelligent Cloud Services
    Intelligent Structure Discovery
    .
  • Import an existing
    Intelligent Structure Discovery
    .amodel
    complex data type definition.
  • Create a complex data type definition using a representative sample file in the Developer tool.
The complex data type definitions are stored in the type definition library that is a Model repository object. The default name of the type definition library is m_Type_Definition_Library.
You cannot manually create a complex data type definition for midstream mapping.
Create a mapping and add mapping objects.
  1. Create a mapping, and add Read and Write transformations.
  2. Create a Read transformation to read the hierarchical data from the source string.
  3. Create a Write transformation to write the hierarchical data to a target struct.
  4. Create an Expression transformation for the PARSE_JSON or PARSE_XML function.
  5. Based on the mapping logic, add other transformations that are supported on the run-time engine.
Create and configure ports in transformations.
  1. Create the Read ports including the string Type that contains the hierarchical data.
  2. Create the Write ports including the struct Type that contains the parsed hierarchical data.
  3. Create the Expression ports:
    Configure the input string as input and output.
    Configure the output struct as output. The Type Definition must reference the complex data type definition you created or imported. Configure the PARSE_JSON or PARSE_XML function for the expression.
Configure the transformations.
Link the ports and configure the transformation properties based on the mapping logic.
Configure the mapping properties.
Configure the mapping run-time properties: choose the Spark validation environment and Hadoop as the execution environment.
Validate and run the mapping.
  1. Validate the mapping to identify and correct any errors.
  2. Optionally, view the engine execution plan to debug the logic.
  3. Run the mapping.

0 COMMENTS

We’d like to hear from you!