Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in a Hadoop Environment
  5. Mapping Objects in a Hadoop Environment
  6. Mappings in the Native Environment
  7. Profiles
  8. Native Environment Optimization
  9. Data Type Reference
  10. Function Reference
  11. Parameter Reference

Mappings in a Hadoop Environment Overview

Mappings in a Hadoop Environment Overview

Configure the Hadoop run-time environment in the Developer tool to optimize mapping performance and process data that is greater than 10 terabytes. In the Hadoop environment, the Data Integration Service pushes the processing to nodes on a Hadoop cluster. When you select the Hadoop environment, you can also select the engine to push the mapping logic to the Hadoop cluster.
You can run standalone mappings, mappings that are a part of a workflow in the Hadoop environment.
Based on the mapping logic, the Hadoop environment can use the following engines to push processing to nodes on a Hadoop cluster:
  • Informatica Blaze engine. An Informatica proprietary engine for distributed processing on Hadoop.
  • Spark engine. A high performance engine for batch processing that can run on a Hadoop cluster or on a Spark standalone mode cluster.
  • Hive engine. A batch processing engine that uses Hadoop technology such as MapReduce or Tez.
You can select which engine the Data Integration Service uses. If you choose more than one engine, the Data Integration Service determines the engine to run the mapping during validation.
When you run a mapping in the Hadoop environment, you must configure a Hadoop connection for the mapping. When you edit the Hadoop connection, you can set the run-time properties for the Hadoop environment and the properties for the engine that runs the mapping.
You can view the execution plan for a mapping to run in the Hadoop environment. View the execution plan for the engine that the Data Integration Service selects to run the mapping.
You can monitor Hive queries and the Hadoop jobs in the Monitoring tool. Monitor the jobs on a Hadoop cluster with the YARN Web User Interface or the Blaze Job Monitor web application.
The Data Integration Service logs messages from the DTM, the Blaze engine, the Spark engine, and the Hive engine in the run-time log files.


Updated July 03, 2018