Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

How to Develop and Run a Mapping to Process Data with an Intelligent Structure Model

How to Develop and Run a Mapping to Process Data with an Intelligent Structure Model

You can create a mapping with a data object that incorporates an intelligent structure model to parse data. You run the mapping on the Spark engine to process the data.
The tasks and the order in which you perform the tasks to develop the mapping depend on the mapping scenario.
The following list outlines the high-level tasks to develop and run a mapping to read and process data in files of any type that an intelligent structure model can process, and then write the data to a target.
Optionally, in
Data Integration
, a service in
Informatica Intelligent Cloud Services
, create an
intelligent structure model
.
Create an intelligent structure model using a representative file. Export the model to an .amodel file. After you save the file locally, you can copy it to the relevant file storage system.
Alternatively, you can select an XML, JSON, ORC, Avro, or Parquet sample file when you create a data object.
Intelligent Structure Discovery
creates an
intelligent structure model
based on the sample file that you select.
You cannot edit or refine a model that
Intelligent Structure Discovery
creates automatically. To edit and refine a model, create, edit, and export it in Cloud
Data Integration
.
In Data Engineering Integration, create a connection.
Create a connection to access data in files that are stored in the relevant system. You can create the following types of connections that will work with the data objects that can incorporate an intelligent structure:
  • Hadoop Distributed File System
  • Amazon S3
  • Microsoft Azure Blob
  • Microsoft Azure Data Lake Store
Create a data object with an intelligent structure model to read and parse source data.
  1. Create a data object with an intelligent structure model to represent the files stored as sources. You can create the following types of data objects with an intelligent structure model:
    • Complex file
    • Amazon S3
    • Microsoft Azure Blob
    • Microsoft Azure Data Lake Store
  2. Configure the data object properties. Note the following guidelines:
    • In
      Resource Format
      , select
      Intelligent Structure Model or Sample File
      .
    • Select an
      intelligent structure model
      that you exported from
      Data Integration
      or a sample file to base the model on.
      To import an intelligent structure model as a data object, you must have the relevant
      Informatica Intelligent Cloud Services
      license. For more information, see Before You Begin.
  3. In the data object read operation, configure columns to project hierarchical data as a complex data type. Enable the Project Column as Complex Data Type property in the Column Projection properties.
Create a data object to write data to a target.
  1. Create a data object to write the data to target storage.
  2. Configure the data object properties.
Do not associate an intelligent structure model with a data object write operation. If you use a write operation that is associated with an intelligent structure model in a mapping, the mapping is not valid.
Create a mapping and add mapping objects.
  1. Create a mapping.
  2. Add a Read transformation based on the data object with the intelligent structure model.
  3. Based on the mapping logic, add other transformations that are supported on the Spark engine. Link the ports and configure the transformation properties based on the mapping logic.
  4. Add a Write transformation based on the data object that passes the data to the target storage or output. Link the ports and configure the transformation properties based on the mapping logic.
Configure the mapping to run on the Spark engine.
Configure the following mapping run-time properties:
  1. Select Hadoop as the validation environment and Spark as the engine.
  2. Select Hadoop as the execution environment and select a Hadoop connection.
Validate and run the mapping on the Spark engine.
  1. Validate the mapping and fix any errors.
  2. Optionally, view the Spark engine execution plan to debug the logic.
  3. Run the mapping.


Updated September 28, 2020