Table of Contents

Search

  1. Preface
  2. Understanding Pipeline Partitioning
  3. Partition Points
  4. Partition Types
  5. Pushdown Optimization
  6. Pushdown Optimization and Transformations
  7. Real-time Processing
  8. Commit Points
  9. Row Error Logging
  10. Workflow Recovery
  11. Stopping and Aborting
  12. Concurrent Workflows
  13. Grid Processing
  14. Load Balancer
  15. Workflow Variables
  16. Parameters and Variables in Sessions
  17. Parameter Files
  18. FastExport
  19. External Loading
  20. FTP
  21. Session Caches
  22. Incremental Aggregation
  23. Session Log Interface
  24. Understanding Buffer Memory
  25. High Precision Data

Advanced Workflow Guide

Advanced Workflow Guide

Pass-Through Partition Type

Pass-Through Partition Type

In pass-through partitioning, the Integration Service processes data without redistributing rows among partitions. Therefore, all rows in a single partition stay in that partition after crossing a pass-through partition point.
When you add a partition point to a pipeline, the master thread creates an additional pipeline stage. Use pass-through partitioning when you want to increase data throughput, but you do not want to increase the number of partitions.
You can specify pass-through partitioning at any valid partition point in a pipeline.
The following figure shows a mapping where pass-through partitioning can increase data throughput:
This mapping includes a reader thread/first stage source, a series of thread/second stage transformations, and a writer thread/third stage target.
  1. Reader Thread (First Stage).
  2. Transformation Thread (Second Stage).
  3. Writer Thread (Third Stage).
By default, this mapping contains partition points at the source qualifier and target instance. Since this mapping contains an XML target, you can configure only one partition at any partition point.
In this case, the master thread creates one reader thread to read data from the source, one transformation thread to process the data, and one writer thread to write data to the target. Each pipeline stage processes the rows as follows:
Source Qualifier
(First Stage)
Transformations
(Second Stage)
Target Instance
(Third Stage)
Row Set 1
-
-
Row Set 2
Row Set 1
-
Row Set 3
Row Set 2
Row Set 1
Row Set 4
Row Set 3
Row Set 2
...
...
...
Row Set n
Row Set (n-1)
Row Set (n-2)
Because the pipeline contains three stages, the Integration Service can process three sets of rows concurrently.
If the Expression transformations are very complicated, processing the second (transformation) stage can take a long time and cause low data throughput. To improve performance, set a partition point at Expression transformation EXP_2 and set the partition type to pass-through. This creates an additional pipeline stage. The master thread creates an additional transformation thread:
The mapping now includes four stages: a reader thread, two transformation threads, and a writer thread.
  1. Reader Thread (First Stage).
  2. Transformation Thread (Second Stage).
  3. Transformation Thread (Third Stage).
  4. Writer Thread (Fourth Stage).
The Integration Service can now process four sets of rows concurrently as follows:
Source Qualifier
(First Stage)
FIL_1 & EXP_1 Transformations
(Second Stage)
EXP_2 & LKP_1 Transformatios
(Third Stage)
Target Instance
(Fourth Stage)
Row Set 1
-
-
-
Row Set 2‘
Row Set 1
-
-
Row Set 3
Row Set 2
Row Set 1
-
Row Set 4
Row Set 3
Row Set 2
Row Set 1
...
...
...
...
Row Set n
Row Set (n-1)
Row Set (n-2)
Row Set (n-3)
By adding an additional partition point at Expression transformation EXP_2, you replace one long running transformation stage with two shorter running transformation stages. Data throughput depends on the longest running stage. So in this case, data throughput increases.

0 COMMENTS

We’d like to hear from you!