Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Catalog Service
  4. Content Management Service
  5. Data Integration Service
  6. Data Integration Service Architecture
  7. Data Integration Service Management
  8. Data Integration Service Grid
  9. Data Integration Service REST API
  10. Data Integration Service Applications
  11. Data Privacy Management Service
  12. Enterprise Data Preparation Service
  13. Interactive Data Preparation Service
  14. Informatica Cluster Service
  15. Mass Ingestion Service
  16. Metadata Access Service
  17. Metadata Manager Service
  18. Model Repository Service
  19. PowerCenter Integration Service
  20. PowerCenter Integration Service Architecture
  21. High Availability for the PowerCenter Integration Service
  22. PowerCenter Repository Service
  23. PowerCenter Repository Management
  24. PowerExchange Listener Service
  25. PowerExchange Logger Service
  26. SAP BW Service
  27. Search Service
  28. System Services
  29. Test Data Manager Service
  30. Test Data Warehouse Service
  31. Web Services Hub
  32. Application Service Upgrade
  33. Appendix A: Application Service Databases
  34. Appendix B: Connecting to Databases from Windows
  35. Appendix C: Connecting to Databases from UNIX or Linux
  36. Appendix D: Updating the DynamicSections Parameter of a DB2 Database

Multiple Threads for Each Pipeline Stage

Multiple Threads for Each Pipeline Stage

When maximum parallelism is set to a value greater than 1, partitioning is enabled. The Data Integration Service separates a mapping into pipeline stages and uses multiple threads to process each stage.
When you maximize parallelism, the Data Integration Service dynamically performs the following tasks at run time:
Divides the data into partitions.
The Data Integration Service dynamically divides the underlying data into partitions and runs the partitions concurrently. The Data Integration Service determines the optimal number of threads for each pipeline stage. The number of threads used for a single pipeline stage cannot exceed the maximum parallelism value. The Data Integration Service can use a different number of threads for each pipeline stage.
Redistributes data across partition points.
The Data Integration Service dynamically determines the best way to redistribute data across a partition point based on the transformation requirements.
The following image shows an example mapping that distributes data across multiple partitions for each pipeline stage:
The mapping distributes the reader pipeline stage and the first transformation pipeline stage across two partitions. At the second transformation pipeline stage, the mapping redistributes the rows across three partitions. The mapping distributes the writer pipeline stage across three partitions.
In the preceding image, maximum parallelism for the Data Integration Service is three. Maximum parallelism for the mapping is Auto. The Data Integration Service separates the mapping into four pipeline stages and uses a total of 12 threads to run the mapping. The Data Integration Service performs the following tasks at each of the pipeline stages:
  • At the reader pipeline stage, the Data Integration Service queries the Oracle database system to discover that both source tables, source A and source B, have two database partitions. The Data Integration Service uses one reader thread for each database partition.
  • At the first transformation pipeline stage, the Data Integration Service redistributes the data to group rows for the join condition across two threads.
  • At the second transformation pipeline stage, the Data Integration Service determines that three threads are optimal for the Aggregator transformation. The service redistributes the data to group rows for the aggregate expression across three threads.
  • At the writer pipeline stage, the Data Integration Service does not need to redistribute the rows across the target partition point. All rows in a single partition stay in that partition after crossing the target partition point.