Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Catalog Service
  4. Content Management Service
  5. Data Integration Service
  6. Data Integration Service Architecture
  7. Data Integration Service Management
  8. Data Integration Service Grid
  9. Data Integration Service REST API
  10. Data Integration Service Applications
  11. Enterprise Data Preparation Service
  12. Interactive Data Preparation Service
  13. Informatica Cluster Service
  14. Mass Ingestion Service
  15. Metadata Access Service
  16. Metadata Manager Service
  17. Model Repository Service
  18. PowerCenter Integration Service
  19. PowerCenter Integration Service Architecture
  20. High Availability for the PowerCenter Integration Service
  21. PowerCenter Repository Service
  22. PowerCenter Repository Management
  23. PowerExchange Listener Service
  24. PowerExchange Logger Service
  25. SAP BW Service
  26. Search Service
  27. System Services
  28. Test Data Manager Service
  29. Test Data Warehouse Service
  30. Web Services Hub
  31. Application Service Upgrade
  32. Appendix A: Application Service Databases
  33. Appendix B: Connecting to Databases from Windows
  34. Appendix C: Connecting to Databases from UNIX or Linux
  35. Appendix D: Updating the DynamicSections Parameter of a DB2 Database

Multiple Threads for Each Pipeline Stage

Multiple Threads for Each Pipeline Stage

When maximum parallelism is set to a value greater than 1, partitioning is enabled. The Data Integration Service separates a mapping into pipeline stages and uses multiple threads to process each stage.
When you maximize parallelism, the Data Integration Service dynamically performs the following tasks at run time:
Divides the data into partitions.
The Data Integration Service dynamically divides the underlying data into partitions and runs the partitions concurrently. The Data Integration Service determines the optimal number of threads for each pipeline stage. The number of threads used for a single pipeline stage cannot exceed the maximum parallelism value. The Data Integration Service can use a different number of threads for each pipeline stage.
Redistributes data across partition points.
The Data Integration Service dynamically determines the best way to redistribute data across a partition point based on the transformation requirements.
The following image shows an example mapping that distributes data across multiple partitions for each pipeline stage:
The mapping distributes the reader pipeline stage and the first transformation pipeline stage across two partitions. At the second transformation pipeline stage, the mapping redistributes the rows across three partitions. The mapping distributes the writer pipeline stage across three partitions.
In the preceding image, maximum parallelism for the Data Integration Service is three. Maximum parallelism for the mapping is Auto. The Data Integration Service separates the mapping into four pipeline stages and uses a total of 12 threads to run the mapping. The Data Integration Service performs the following tasks at each of the pipeline stages:
  • At the reader pipeline stage, the Data Integration Service queries the Oracle database system to discover that both source tables, source A and source B, have two database partitions. The Data Integration Service uses one reader thread for each database partition.
  • At the first transformation pipeline stage, the Data Integration Service redistributes the data to group rows for the join condition across two threads.
  • At the second transformation pipeline stage, the Data Integration Service determines that three threads are optimal for the Aggregator transformation. The service redistributes the data to group rows for the aggregate expression across three threads.
  • At the writer pipeline stage, the Data Integration Service does not need to redistribute the rows across the target partition point. All rows in a single partition stay in that partition after crossing the target partition point.