Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Catalog Service
  4. Content Management Service
  5. Data Integration Service
  6. Data Integration Service Architecture
  7. Data Integration Service Management
  8. Data Integration Service Grid
  9. Data Integration Service REST API
  10. Data Integration Service Applications
  11. Data Privacy Management Service
  12. Enterprise Data Preparation Service
  13. Interactive Data Preparation Service
  14. Informatica Cluster Service
  15. Mass Ingestion Service
  16. Metadata Access Service
  17. Metadata Manager Service
  18. Model Repository Service
  19. PowerCenter Integration Service
  20. PowerCenter Integration Service Architecture
  21. High Availability for the PowerCenter Integration Service
  22. PowerCenter Repository Service
  23. PowerCenter Repository Management
  24. PowerExchange Listener Service
  25. PowerExchange Logger Service
  26. SAP BW Service
  27. Search Service
  28. System Services
  29. Test Data Manager Service
  30. Test Data Warehouse Service
  31. Web Services Hub
  32. Application Service Upgrade
  33. Appendix A: Application Service Databases
  34. Appendix B: Connecting to Databases from Windows
  35. Appendix C: Connecting to Databases from UNIX or Linux
  36. Appendix D: Updating the DynamicSections Parameter of a DB2 Database

One Thread for Each Pipeline Stage

One Thread for Each Pipeline Stage

When maximum parallelism is set to 1, partitioning is disabled. The Data Integration Service separates a mapping into pipeline stages and uses one reader thread, one transformation thread, and one writer thread to process each stage.
Each mapping contains one or more pipelines. A pipeline consists of a Read transformation and all the transformations that receive data from that Read transformation. The Data Integration Service separates a mapping pipeline into pipeline stages and then performs the extract, transformation, and load for each pipeline stage in parallel.
Partition points mark the boundaries in a pipeline and divide the pipeline into stages. For every mapping pipeline, the Data Integration Service adds a partition point after the Read transformation and before the Write transformation to create multiple pipeline stages.
Each pipeline stage runs in one of the following threads:
  • Reader thread that controls how the Data Integration Service extracts data from the source.
  • Transformation thread that controls how the Data Integration Service processes data in the pipeline.
  • Writer thread that controls how the Data Integration Service loads data to the target.
The following figure shows a mapping separated into a reader pipeline stage, a transformation pipeline stage, and a writer pipeline stage:
The source and target are partition points. The reader pipeline stage contains a source, the transformation pipeline stage contains a Filter and an Expression transformation, and the writer pipeline stage contains the target.
Because the pipeline contains three stages, the Data Integration Service can process three sets of rows concurrently and optimize mapping performance. For example, while the reader thread processes the third row set, the transformation thread processes the second row set, and the writer thread processes the first row set.
The following table shows how multiple threads can concurrently process three sets of rows:
Reader Thread
Transformation Thread
Writer Thread
Row Set 1
-
-
Row Set 2
Row Set 1
-
Row Set 3
Row Set 2
Row Set 1
Row Set 4
Row Set 3
Row Set 2
Row Set n
Row Set (n-1)
Row Set (n-2)
If the mapping pipeline contains transformations that perform complicated calculations, processing the transformation pipeline stage can take a long time. To optimize performance, the Data Integration Service adds partition points before some transformations to create an additional transformation pipeline stage.