Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Content Management Service
  4. Data Integration Service
  5. Data Integration Service Architecture
  6. Data Integration Service Management
  7. Data Integration Service Grid
  8. Data Integration Service Applications
  9. Metadata Manager Service
  10. Model Repository Service
  11. PowerCenter Integration Service
  12. PowerCenter Integration Service Architecture
  13. High Availability for the PowerCenter Integration Service
  14. PowerCenter Repository Service
  15. PowerCenter Repository Management
  16. PowerExchange Listener Service
  17. PowerExchange Logger Service
  18. SAP BW Service
  19. Search Service
  20. System Services
  21. Test Data Manager Service
  22. Web Services Hub
  23. Application Service Upgrade
  24. Application Service Databases
  25. Connecting to Databases from Windows
  26. Connecting to Databases from UNIX
  27. Updating the DynamicSections Parameter of a DB2 Database

Multiple Threads for Each Pipeline Stage

Multiple Threads for Each Pipeline Stage

When maximum parallelism is set to a value greater than 1, partitioning is enabled. The Data Integration Service separates a mapping into pipeline stages and uses multiple threads to process each stage.
When you maximize parallelism, the Data Integration Service dynamically performs the following tasks at run time:
Divides the data into partitions.
The Data Integration Service dynamically divides the underlying data into partitions and runs the partitions concurrently. The Data Integration Service determines the optimal number of threads for each pipeline stage. The number of threads used for a single pipeline stage cannot exceed the maximum parallelism value. The Data Integration Service can use a different number of threads for each pipeline stage.
Redistributes data across partition points.
The Data Integration Service dynamically determines the best way to redistribute data across a partition point based on the transformation requirements.
The following image shows an example mapping that distributes data across multiple partitions for each pipeline stage:
The mapping distributes the reader pipeline stage and the first transformation pipeline stage across two partitions. At the second transformation pipeline stage, the mapping redistributes the rows across three partitions. The mapping distributes the writer pipeline stage across three partitions.
In the preceding image, maximum parallelism for the Data Integration Service is three. Maximum parallelism for the mapping is Auto. The Data Integration Service separates the mapping into four pipeline stages and uses a total of 12 threads to run the mapping. The Data Integration Service performs the following tasks at each of the pipeline stages:
  • At the reader pipeline stage, the Data Integration Service queries the Oracle database system to discover that both source tables, source A and source B, have two database partitions. The Data Integration Service uses one reader thread for each database partition.
  • At the first transformation pipeline stage, the Data Integration Service redistributes the data to group rows for the join condition across two threads.
  • At the second transformation pipeline stage, the Data Integration Service determines that three threads are optimal for the Aggregator transformation. The service redistributes the data to group rows for the aggregate expression across three threads.
  • At the writer pipeline stage, the Data Integration Service does not need to redistribute the rows across the target partition point. All rows in a single partition stay in that partition after crossing the target partition point.