Table of Contents

Search

  1. Preface
  2. Mappings
  3. Mapplets
  4. Mapping Parameters
  5. Mapping Outputs
  6. Generate a Mapping from an SQL Query
  7. Dynamic Mappings
  8. How to Develop and Run a Dynamic Mapping
  9. Dynamic Mapping Use Cases
  10. Mapping Administration
  11. Export to PowerCenter
  12. Import From PowerCenter
  13. Performance Tuning
  14. Pushdown Optimization
  15. Partitioned Mappings
  16. Developer Tool Naming Conventions

Developer Mapping Guide

Developer Mapping Guide

Cost-Based Optimization Method

Cost-Based Optimization Method

With cost-based optimization, the Data Integration Service evaluates a mapping, generates semantically equivalent mappings, and runs the mapping with the best possible performance. Cost-based optimization reduces run time for mappings that perform adjacent inner-join, and full-outer join operations.
Semantically equivalent mappings are mappings that perform identical functions and produce the same results. To generate semantically equivalent mappings, the Data Integration Service divides the original mapping into fragments. The Data Integration Service then determines which mapping fragments it can optimize.
During optimization, the Data Integration Service might add, remove, or reorder transformations within a fragment. The Data Integration Service verifies that the optimized fragments produce the same results as the original fragments and forms alternate mappings that use the optimized fragments.
The Data Integration Service can also apply a sorted merge join if it determines that the sorted merge join performance is better than the nested loop join performance. A sorted merge join uses sort order to arrange two data sets before performing the join. A nested loop join uses nested loops to join two data sets. The Data Integration Service might use the sorting information in the sources or create a Sorter transformation if the cost of sorting the data is less expensive then processing the nested loop join.
The Data Integration Service generates all or almost all of the mappings that are semantically equivalent to the original mapping. It uses profiling statistics or database statistics to compute the cost for the original mapping and each alternate mapping. Then, it identifies the mapping that runs most quickly. The Data Integration Service performs a validation check on the best alternate mapping to ensure that it is valid and produces the same results as the original mapping.
The Data Integration Service caches the best alternate mapping in memory. When you run a mapping, the Data Integration Service retrieves the alternate mapping and runs it instead of the original mapping.
The Developer tool does not enable this method by default.

0 COMMENTS

We’d like to hear from you!