Table of Contents


  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in a Hadoop Environment
  5. Mappings in the Native Environment
  6. Profiles
  7. Native Environment Optimization
  9. Data Type Reference

Profiles Overview

Profiles Overview

You can run a profile on HDFS and Hive data sources in the Hadoop environment when you use the Hive engine. The Hadoop environment helps improve the performance. The run-time environment, native Data Integration Service or Hadoop, does not affect the profile results.
You can run a column profile, rule profile, and data domain discovery on a single data object profile in the Hadoop environment. You can perform these profiling capabilities on both native and Hadoop data sources. A native data source is a non-Hadoop source, such as a flat file, relational source, or mainframe source. A Hadoop data source can be either a Hive or HDFS source.
If you use Informatica Developer or Informatica Analyst, you can choose either native or Hadoop run-time environment to run a profile. If you choose the Hadoop environment, Informatica Developer or Informatica Analyst sets the run-time environment in the profile definition.
When you run a profile on in the Hadoop environment from the Developer tool, you validate the data source before you run the profile. To validate the data source, you must select a Hive connection. You can then choose to run the profile in either native or Hadoop run-time environment.
You can view the Hive query plan in the Administrator tool. The Hive query plan consists of one or more scripts that the Data Integration Service generates based on the logic defined in the profile. Each script contains Hive queries that run against the Hive database. One query contains details about the MapReduce job. The remaining queries perform other actions such as creating and dropping tables in the Hive database.
You can use the
tab of the Administrator tool to monitor a profile and Hive statements running on Hadoop. You can expand a profile job to view the Hive queries generated for the profile. You can also view the run-time log for each profile. The log shows run-time details, such as the time each task runs and the Hive queries that run on Hadoop, and errors that occur.
tab contains the following views:
Properties view
view shows properties about the selected profile.
Hive Query Plan view
Hive Query Plan
view shows the Hive query plan for the selected profile.

Updated July 03, 2018