Table of Contents

Search

  1. Preface
  2. Performance Tuning Overview
  3. Target Optimization
  4. Source Optimization
  5. Transformation Optimization
  6. Mapping Optimization
  7. Partitioned Mapping Optimization
  8. Run-time Optimization
  9. SQL Data Service Optimization
  10. Web Service Optimization
  11. Connections Optimization
  12. Data Transformation Optimization

Performance Tuning Guide

Performance Tuning Guide

Lookup Transformation Optimization

Lookup Transformation Optimization

Lookup transformations can slow performance depending on the lookup cache type and lookup conditions.
Consider the following solutions for Lookup transformation bottlenecks:
Use the optimal database driver.
The Data Integration Service can connect to a lookup table using a native database driver or an ODBC driver. Native database drivers provide better mapping performance than ODBC drivers.
Cache lookup tables for relational or flat file lookups.
To improve lookup performance for relational or flat file sources, enable lookup caching in the transformation. When you enable caching, the Data Integration Service caches the lookup table. When you run the mapping, the Data Integration Service queries the lookup cache instead of the lookup table. When this option is not enabled, the Data Integration Service queries the lookup table on a row-by-row basis.
The result of the lookup query and processing is the same, whether or not you cache the lookup table. However, using a lookup cache can increase mapping performance for smaller lookup tables. In general, you want to cache lookup tables that need less than 300 MB.
Cache lookup tables for logical data object lookups.
To improve lookup performance on a logical data object, you can enable data object caching on the Data Integration Service. When you enable data object caching, the Data Integration Service caches the logical data object. To enable data object caching, you must deploy the mapping to an application, enable caching of the logical data object, and run the mapping with the command
infacmd ms runmapping
. When you run the mapping, the Data Integration Service queries the data object cache instead of the logical data object.
If you run the mapping from the Developer tool, the Lookup transformation queries the logical data object on a row-by-row basis.
Use the appropriate cache type.
Use the following types of caches to increase performance:
  • Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed cache between transformations in the same mapping. You can share a named cache between transformations in the same or different mappings.
  • Persistent cache. To save and reuse the cache files, you can configure the transformation to use a persistent cache. Use this feature when you know the lookup table does not change between mapping runs. Using a persistent cache can improve performance because the Data Integration Service builds the memory cache from the cache files instead of from the database.
Enable concurrent caches.
When the Data Integration Service processes mappings that contain Lookup transformations, the Data Integration Service builds a cache in memory when it processes the first row of data in a cached Lookup transformation. If there are multiple Lookup transformations in a mapping, the Data Integration Service creates the caches sequentially when the first row of data is processed by the Lookup transformation. This slows Lookup transformation processing.
You can enable concurrent caches to improve performance. When the number of additional concurrent pipelines is set to one or more, the Data Integration Service builds caches concurrently rather than sequentially. Performance improves greatly when the mappings contain a number of active transformations that may take time to complete, such as Aggregator, Joiner, or Sorter transformations. When you enable multiple concurrent pipelines, the Data Integration Service no longer waits for active mappings to complete before it builds the cache. Other Lookup transformations in the pipeline also build caches concurrently.
Optimize lookup condition matches.
When the Lookup transformation matches lookup cache data with the lookup condition, it sorts and orders the data to determine the first matching value and the last matching value. You can configure the transformation to return any value that matches the lookup condition. When you configure the Lookup transformation to return any matching value, the transformation returns the first value that matches the lookup condition. It does not index all ports as it does when you configure the transformation to return the first matching value or the last matching value.
When you use any matching value, performance can improve because the transformation does not index on all ports, which can slow performance.
Reduce the number of cached rows.
You can reduce the number of rows included in the cache to increase performance. Use the Lookup SQL Override option to add a WHERE clause to the default SQL statement. When you add a WHERE clause to a Lookup transformation that uses a dynamic cache, use a Filter transformation before the Lookup transformation to pass rows into the dynamic cache that match the WHERE clause.
Override the ORDER BY statement.
By default, the Data Integration Service generates an ORDER BY statement for a cached lookup. The ORDER BY statement contains all lookup ports. To increase performance, suppress the default ORDER BY statement and enter an override ORDER BY with fewer columns.
The Data Integration Service always generates an ORDER BY statement, even if you enter one in the override. Place two dashes ‘--’ after the ORDER BY override to suppress the generated ORDER BY statement.
For example, a Lookup transformation uses the following lookup condition:
ITEM_ID = IN_ITEM_ID PRICE <= IN_PRICE
The Lookup transformation includes three lookup ports used in the mapping, ITEM_ID, ITEM_NAME, and PRICE. When you enter the ORDER BY statement, enter the columns in the same order as the ports in the lookup condition. You must also enclose all database reserved words in quotes.
Enter the following lookup query in the lookup SQL override:
SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE, ITEMS_DIM.ITEM_ID FROM ITEMS_DIM ORDER BY ITEMS_DIM.ITEM_ID, ITEMS_DIM.PRICE --
Use a machine with more memory.
To increase mapping performance, run the mapping on a Data Integration Service node with a large amount of memory. Increase the index and data cache sizes as high as you can without straining the machine. If the Data Integration Service node has enough memory, increase the cache so it can hold all data in memory without paging to disk.
Optimize the lookup condition.
If you include more than one lookup condition, place the conditions in the following order to optimize lookup performance:
  • Equal to (=)
  • Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=)
  • Not equal to (!=)
Filter lookup rows.
To improve performance, create a filter condition to reduce the number of lookup rows retrieved from the source when the lookup cache is built.
Index the lookup table.
The Data Integration Service needs to query, sort, and compare values in the lookup condition columns. The index needs to include every column used in a lookup condition.
You can improve performance for the following types of lookups:
  • Cached lookups. To improve performance, index the columns in the lookup ORDER BY statement. The mapping log file contains the ORDER BY statement.
  • Uncached lookups. To improve performance, index the columns in the lookup condition. The Data Integration Service issues a SELECT statement for each row that passes into the Lookup transformation.
Optimize multiple lookups.
If a mapping contains multiple lookups, even with caching enabled and enough heap memory, the lookups can slow performance. Tune the Lookup transformations that query the largest amounts of data to improve overall performance.
If the lookup table is on the same database as the source table in your mapping and caching is not feasible, join the tables in the source database rather than using a Lookup transformation.

Back to Top

0 COMMENTS

We’d like to hear from you!