Table of Contents


  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in a Hadoop Environment
  5. Mapping Objects in a Hadoop Environment
  6. Mappings in the Native Environment
  7. Profiles
  8. Native Environment Optimization
  9. Data Type Reference
  10. Function Reference
  11. Parameter Reference

Hadoop Environment

Hadoop Environment

You can run profiles and scorecards in the Hadoop environment on the Hive engine or Blaze engine.
The Hive engine is a batch processing engine that uses Hadoop technology such as MapReduce or Tez. Informatica Blaze engine is an Informatica proprietary engine for distributed processing on Hadoop.
When you run a profile in the Hadoop environment, the Analyst tool or Developer tool submits the profile jobs to the Profiling Service Module. The Profiling Service Module then breaks down the profile jobs into a set of mappings. The Data Integration Service pushes the mappings to the Hadoop environment through a Hive connection or Hadoop connection. The Hive engine or Blaze engine processes the mappings and the Data Integration Service writes the profile results to the profiling warehouse.
In the Developer tool, you can run single object profiles and multiple object profiles on the Hive engine and Blaze engine. You can run enterprise discovery profiles on the Blaze engine.
In the Analyst tool, you can run column profiles on the Hive engine and Blaze engine. You can run enterprise discovery profiles and scorecards on the Blaze engine.
You can use native or Hadoop data sources to create and run profiles in the Hadoop environment. A Hadoop data source is a Hive, HDFS, or Sqoop source. A Sqoop data source can be Aurora, Greenplum, IBM DB2, IBM DB2 for z/OS, Microsoft SQL Server, Netezza, Oracle, or Teradata. You can choose to run a native source in the Hadoop run-time environment when the volume of data in the data source is huge or to process the data faster. You can also run a profile on a mapping specification or a logical data source with a Hive or HDFS data source in the Hadoop environment.

Updated July 03, 2018