Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Catalog Service
  4. Content Management Service
  5. Data Integration Service
  6. Data Integration Service Architecture
  7. Data Integration Service Management
  8. Data Integration Service Grid
  9. Data Integration Service Applications
  10. Interactive Data Preparation Service
  11. Enterprise Data Preparation Service
  12. Informatica Cluster Service
  13. Mass Ingestion Service
  14. Metadata Access Service
  15. Metadata Manager Service
  16. Model Repository Service
  17. PowerCenter Integration Service
  18. PowerCenter Integration Service Architecture
  19. High Availability for the PowerCenter Integration Service
  20. PowerCenter Repository Service
  21. PowerCenter Repository Management
  22. PowerExchange Listener Service
  23. PowerExchange Logger Service
  24. SAP BW Service
  25. Search Service
  26. System Services
  27. Test Data Manager Service
  28. Test Data Warehouse Service
  29. Web Services Hub
  30. Application Service Upgrade
  31. Appendix A: Application Service Databases
  32. Appendix B: Connecting to Databases from Windows
  33. Appendix C: Connecting to Databases
  34. Appendix D: Updating the DynamicSections Parameter of a DB2 Database

Multiple Threads for Each Pipeline Stage

Multiple Threads for Each Pipeline Stage

When maximum parallelism is set to a value greater than 1, partitioning is enabled. The Data Integration Service separates a mapping into pipeline stages and uses multiple threads to process each stage.
When you maximize parallelism, the Data Integration Service dynamically performs the following tasks at run time:
Divides the data into partitions.
The Data Integration Service dynamically divides the underlying data into partitions and runs the partitions concurrently. The Data Integration Service determines the optimal number of threads for each pipeline stage. The number of threads used for a single pipeline stage cannot exceed the maximum parallelism value. The Data Integration Service can use a different number of threads for each pipeline stage.
Redistributes data across partition points.
The Data Integration Service dynamically determines the best way to redistribute data across a partition point based on the transformation requirements.
The following image shows an example mapping that distributes data across multiple partitions for each pipeline stage:
The mapping distributes the reader pipeline stage and the first transformation pipeline stage across two partitions. At the second transformation pipeline stage, the mapping redistributes the rows across three partitions. The mapping distributes the writer pipeline stage across three partitions.
In the preceding image, maximum parallelism for the Data Integration Service is three. Maximum parallelism for the mapping is Auto. The Data Integration Service separates the mapping into four pipeline stages and uses a total of 12 threads to run the mapping. The Data Integration Service performs the following tasks at each of the pipeline stages:
  • At the reader pipeline stage, the Data Integration Service queries the Oracle database system to discover that both source tables, source A and source B, have two database partitions. The Data Integration Service uses one reader thread for each database partition.
  • At the first transformation pipeline stage, the Data Integration Service redistributes the data to group rows for the join condition across two threads.
  • At the second transformation pipeline stage, the Data Integration Service determines that three threads are optimal for the Aggregator transformation. The service redistributes the data to group rows for the aggregate expression across three threads.
  • At the writer pipeline stage, the Data Integration Service does not need to redistribute the rows across the target partition point. All rows in a single partition stay in that partition after crossing the target partition point.