Table of Contents


  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Mapping Example

Mapping Example

Your organization needs to analyze purchase order details such as customer ID, item codes, and item quantity. The purchase order details are stored in Microsoft Excel spreadsheets in HDFS. The data must be changed into text files for storage. In addition, private customer data must be removed. Create a mapping that reads all the purchase records from the file in HDFS using a data object with an intelligent structure. The mapping must parse the data and write it to a storage target.
You can use the extracted data for auditing.
The following figure shows the example mapping:
HDFS mapping example shows complex file input, a data processor transformation and a relational output.
You can use the following objects in the HDFS mapping:
HDFS Input
The input object, Read_transformation_with_intelligent_structure_model, is a Read transformation that processed a Microsoft Excel file stored in HDFS and creates field output.
Amazon S3 Output
The output object, Write_transformation, is a Write transformation that represents an Amazon S3 bucket.
When you run the mapping, the Data Integration Service reads the file in a binary stream and passes it to the Read transformation. The Read transformation extracts the relevant data in accordance to the intelligent structure and passes data to the Write transformation. The Write transformation writes the data to the storage target.
You can configure the mapping to run in a Hadoop run-time environment.
Complete the following tasks to configure the mapping:
  1. Create an HDFS connection to read files from the Hadoop cluster.
  2. Create a complex file data object read operation. Specify the following parameters:
    • The intelligent structure as the resource in the data object. The intelligent structure was configured so that it does not pass sensitive data.
    • The HDFS file location.
    • The input file folder location in the read data object operation.
  3. Drag and drop the complex file data object read operation into a mapping.
  4. Create an Amazon S3 connection.
  5. Create a Write transformation for the Amazon S3 data object and add it to the mapping.

Updated October 23, 2019