Use the Hadoop run-time environment in the Developer tool to optimize mapping performance and process data that is greater than 10 terabytes. In the Hadoop environment, the Data Integration Service pushes the processing to nodes on a Hadoop cluster. When you select the Hadoop environment, you can also select the engine to push the mapping logic to the Hadoop cluster.
You can run standalone mappings, mappings that are a part of a workflow in the Hadoop environment.
Based on the mapping logic, the Hadoop environment can use the following engines to push processing to nodes on a Hadoop cluster:
Hive engine. The Hive engine uses Hadoop technology such as MapReduce or Tez for processing batch data.
Informatica Blaze engine. The Blaze engine is an Informatica proprietary engine for distributed processing on Hadoop.
You can select which engine the Data Integration Service uses.
When you run a mapping in the Hadoop environment, you must configure a Hadoop connection for the mapping. When you edit the Hadoop connection, you can view or configure run-time properties for the Hadoop environment. You can configure the Hive and Blaze engine properties in the Hadoop connection. You can also use parameters to represent properties in the Hadoop environment if you need to use constant values between mapping runs.
You can view the execution plan for a mapping in the Hadoop environment. Viewing the execution plan might enable you to tune the mapping to improve performance. The Hadoop execution plan displays the execution plan for the engine that the Data Integration Service selects to run the mapping.
When you run the mapping, the Data Integration Service converts the mapping to a Hive or Blaze engine execution plan that runs on a Hadoop cluster. You can view the Hive or Blaze engine execution plan using the Developer tool or the Administrator tool.
You can monitor Hive queries and the Hadoop jobs associated with a query for a Hive engine mapping in the Monitoring tool. You can also monitor Blaze engine mapping jobs in the Monitoring tool, or monitor the jobs on a Hadoop cluster with the Blaze Job Monitor web application.
The Data Integration Service logs messages from the DTM, Blaze engine, and Hive engine in the runtime log files.