Table of Contents


  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in the Hadoop Environment
  5. Mapping Objects in the Hadoop Environment
  6. Processing Hierarchical Data on the Spark Engine
  7. Stateful Computing on the Spark Engine
  8. Monitoring Mappings in the Hadoop Environment
  9. Mappings in the Native Environment
  10. Profiles
  11. Native Environment Optimization
  12. Data Type Reference
  13. Complex File Data Object Properties
  14. Function Reference
  15. Parameter Reference

Hive Engine Architecture

Hive Engine Architecture

The Data Integration Service can use the Hive engine to run Model repository mappings or profiles on a Hadoop cluster.
To run a mapping or profile with the Hive engine, the Data Integration Service creates HiveQL queries based on the transformation or profiling logic. The Data Integration Service submits the HiveQL queries to the Hive driver. The Hive driver converts the HiveQL queries to MapReduce jobs, and then sends the jobs to the Hadoop cluster.
The following diagram shows the architecture of how a Hadoop cluster processes MapReduce jobs sent from the Hive driver:
The following events occur when the Hive driver sends jobs to the Hadoop cluster:
  1. The Hive driver sends the MapReduce jobs to the Resource Manager in the Hadoop cluster.
  2. The Resource Manager sends the jobs request to the Node Manager that retrieves a list of data nodes that can process the MapReduce jobs.
  3. The Node Manager assigns MapReduce jobs to the data nodes.
  4. The Hive driver also connects to the Hive metadata database through the Hive metastore to determine where to create temporary tables. The Hive driver uses temporary tables to process the data. The Hive driver removes temporary tables after completing the task.

Updated December 13, 2018