Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings
  4. Sources
  5. Targets
  6. Transformations
  7. Data Preview
  8. Cluster Workflows
  9. Profiles
  10. Monitoring
  11. Hierarchical Data Processing
  12. Hierarchical Data Processing Configuration
  13. Hierarchical Data Processing with Schema Changes
  14. Intelligent Structure Models
  15. Stateful Computing
  16. Appendix A: Connections
  17. Appendix B: Data Type Reference
  18. Appendix C: Function Reference

How to Develop and Run a Mapping to Process Data with an Intelligent Structure Model

How to Develop and Run a Mapping to Process Data with an Intelligent Structure Model

You can create a mapping with a data object that incorporates an intelligent structure model to parse data. You run the mapping on the Spark engine to process the data.
The tasks and the order in which you perform the tasks to develop the mapping depend on the mapping scenario.
The following list outlines the high-level tasks to develop and run a mapping to read and process data in files of any type that an intelligent structure model can process, and then write the data to a target.
In
Data Integration
, a service in
Informatica Intelligent Cloud Services
, create an intelligent structure model.
Create an intelligent structure model using a representative file. Export the model to an .amodel file. After you save the file locally, you can copy it to the relevant file storage system.
For more information, see Intelligent structure models in the Data Integration help.
In Big Data Management, create a connection.
Create a connection to access data in files that are stored in the relevant system. You can create the following types of connections that will work with the data objects that can incorporate an intelligent structure:
  • Hadoop Distributed File System
  • Amazon S3
  • Microsoft Azure Blob
  • Microsoft Azure Data Lake Store
Create a data object with an intelligent structure model to read and parse source data.
  1. Create a data object with an intelligent structure model to represent the files stored as sources. You can create the following types of data objects with an intelligent structure model:
    • Complex file
    • Amazon S3
    • Microsoft Azure Blob
    • Microsoft Azure Data Lake Store
  2. Configure the data object properties.
  3. In the data object read operation, configure columns to project hierarchical data as a complex data type. Enable the Project Column as Complex Data Type property in the Column Projection properties.
You must have the relevant
Informatica Intelligent Cloud Services
license to import an intelligent structure model as a data object. For more information, see Before You Begin.
Create a data object to write data to a target.
  1. Create a data object to write the data to target storage.
  2. Configure the data object properties.
Do not associate an intelligent structure model with a data object write operation. If you use a write operation that is associated with an intelligent structure model in a mapping, the mapping is not valid.
Create a mapping and add mapping objects.
  1. Create a mapping.
  2. Add a Read transformation based on the data object with the intelligent structure model.
  3. Based on the mapping logic, add other transformations that are supported on the Spark engine. Link the ports and configure the transformation properties based on the mapping logic.
  4. Add a Write transformation based on the data object that passes the data to the target storage or output. Link the ports and configure the transformation properties based on the mapping logic.
Configure the mapping to run on the Spark engine.
Configure the following mapping run-time properties:
  1. Select Hadoop as the validation environment and Spark as the engine.
  2. Select Hadoop as the execution environment and select a Hadoop connection.
Validate and run the mapping on the Spark engine.
  1. Validate the mapping and fix any errors.
  2. Optionally, view the Spark engine execution plan to debug the logic.
  3. Run the mapping.


Updated July 10, 2020