Table of Contents

Search

  1. Preface
  2. Mappings
  3. Mapplets
  4. Mapping Parameters
  5. Where to Assign Parameters
  6. Mapping Outputs
  7. Generate a Mapping from an SQL Query
  8. Dynamic Mappings
  9. How to Develop and Run a Dynamic Mapping
  10. Dynamic Mapping Use Cases
  11. Mapping Administration
  12. Export to PowerCenter
  13. Import From PowerCenter
  14. Performance Tuning
  15. Pushdown Optimization
  16. Partitioned Mappings
  17. Developer Tool Naming Conventions

Developer Mapping Guide

Developer Mapping Guide

Multiple Threads for Each Pipeline Stage

Multiple Threads for Each Pipeline Stage

When maximum parallelism is set to a value greater than 1, partitioning is enabled. The Data Integration Service separates a mapping into pipeline stages and uses multiple threads to process each stage. The number of threads in any pipeline stage equals the number of partitions in the stage.
When you maximize parallelism, the Data Integration Service dynamically performs the following tasks at run time:
Divides the data into partitions.
The Data Integration Service dynamically divides the underlying data into partitions and runs the partitions concurrently. The Data Integration Service determines the optimal number of threads for each pipeline stage. The number of threads used for a single pipeline stage cannot exceed the maximum parallelism value. The Data Integration Service can use a different number of threads for each pipeline stage.
Redistributes data across partition points.
The Data Integration Service dynamically determines the best way to redistribute data across a partition point based on the transformation requirements.
The following image shows an example mapping that distributes data across multiple partitions for each pipeline stage:
The mapping distributes the reader pipeline stage and the first transformation pipeline stage across two partitions. At the second transformation pipeline stage, the mapping redistributes the rows across three partitions. The mapping distributes the writer pipeline stage across three partitions.
In the preceding image, maximum parallelism for the Data Integration Service is three. Maximum parallelism for the mapping is Auto. The Data Integration Service separates the mapping into four pipeline stages and uses a total of 12 threads to run the mapping. The Data Integration Service performs the following tasks at each of the pipeline stages:
  • At the reader pipeline stage, the Data Integration Service queries the Oracle database system to discover that both source tables, source A and source B, have two database partitions. The Data Integration Service uses one reader thread for each database partition.
  • At the first transformation pipeline stage, the Data Integration Service redistributes the data to group rows for the join condition across two threads.
  • At the second transformation pipeline stage, the Data Integration Service determines that three threads are optimal for the Aggregator transformation. The service redistributes the data to group rows for the aggregate expression across three threads.
  • At the writer pipeline stage, the Data Integration Service does not need to redistribute the rows across the target partition point. All rows in a single partition stay in that partition after crossing the target partition point.

0 COMMENTS

We’d like to hear from you!