Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Back Next

Filter Optimization for Hive Sources

To optimize mappings that read from a partitioned Hive source, you typically add filter conditions on a relational data object to remove rows at the source. The filter limits the data that flows through the mapping pipeline which gives a performance benefit. However, when the Blaze engine reads from a Hive source with a filter condition, the engine interprets the filters as SQL overrides, translates into multiple grid tasks, and adds a performance overhead.

The Blaze engine translates the filters into the following grid tasks:

A grid task to create a HiveServer2 job for the filter override to stage the intermediate data.

A grid task to operate against the staged data and to apply the downstream mapping logic.

For example, the following image shows the Blaze engine execution plan with two grid tasks for a passthrough mapping with a filter:

To avoid the performance overhead, set the following custom flag as an advanced property of the mapping:

Hive.SourceFilterAsInfaExpression = true

If multiple mappings require the flag, add the custom flag at the Data Integration Service level.

The following image shows the custom flag that you added as an execution parameter for the mapping:

The following image shows the Blaze engine execution plan with a single grid task after you set the custom flag:

Use a valid Informatica expression as a filter condition. The filter condition must not refer to the table name. For example, instead of

LineItem.l_tax > $dataasofdate

, use

l_tax > $dataasofdate

The following image shows the

Query

view of the relational data object where you define the filter condition:

Rename Saved Search

Table of Contents

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Filter Optimization for Hive Sources

Filter Optimization for Hive Sources