Table of Contents

Search

  1. Preface
  2. Understanding Pipeline Partitioning
  3. Partition Points
  4. Partition Types
  5. Pushdown Optimization
  6. Pushdown Optimization and Transformations
  7. Real-time Processing
  8. Commit Points
  9. Row Error Logging
  10. Workflow Recovery
  11. Stopping and Aborting
  12. Concurrent Workflows
  13. Grid Processing
  14. Load Balancer
  15. Workflow Variables
  16. Parameters and Variables in Sessions
  17. Parameter Files
  18. FastExport
  19. External Loading
  20. FTP
  21. Session Caches
  22. Incremental Aggregation
  23. Session Log Interface
  24. Understanding Buffer Memory
  25. High Precision Data

Advanced Workflow Guide

Advanced Workflow Guide

Setting Partition Types in the Pipeline

Setting Partition Types in the Pipeline

You can create different partition types at different points in the pipeline.
The following figure shows a mapping where you can create partition types to increase session performance:
The mapping includes the following elements: Items (flat file), SQ_Items, FIL_ActiveItems, SRT_ItemsDescSort, AGG_AvgCostAndPrice, T_ITEM_PRICES (Oracle)
This mapping reads data about items and calculates average wholesale costs and prices. The mapping must read item information from three flat files of various sizes, and then filter out discontinued items. It sorts the active items by description, calculates the average prices and wholesale costs, and writes the results to a relational database in which the target tables are partitioned by key range.
You can delete the default partition point at the Aggregator transformation because hash auto-keys partitioning at the Sorter transformation sends all rows that contain items with the same description to the same partition. Therefore, the Aggregator transformation receives data for all items with the same description in one partition and can calculate the average costs and prices for this item correctly.
When you use this mapping in a session, you can increase session performance by defining different partition types at the following partition points in the pipeline:
  • Source qualifier.
    To read data from the three flat files concurrently, you must specify three partitions at the source qualifier. Accept the default partition type, pass-through.
  • Filter transformation.
    Since the source files vary in size, each partition processes a different amount of data. Set a partition point at the Filter transformation, and choose round-robin partitioning to balance the load going into the Filter transformation.
  • Sorter transformation.
    To eliminate overlapping groups in the Sorter and Aggregator transformations, use hash auto-keys partitioning at the Sorter transformation. This causes the Integration Service to group all items with the same description into the same partition before the Sorter and Aggregator transformations process the rows. You can delete the default partition point at the Aggregator transformation.
  • Target.
    Since the target tables are partitioned by key range, specify key range partitioning at the target to optimize writing data to the target.

0 COMMENTS

We’d like to hear from you!