Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Catalog Service
  4. Content Management Service
  5. Data Integration Service
  6. Data Integration Service Architecture
  7. Data Integration Service Management
  8. Data Integration Service Grid
  9. Data Integration Service REST API
  10. Data Integration Service Applications
  11. Enterprise Data Preparation Service
  12. Interactive Data Preparation Service
  13. Informatica Cluster Service
  14. Mass Ingestion Service
  15. Metadata Access Service
  16. Metadata Manager Service
  17. Model Repository Service
  18. PowerCenter Integration Service
  19. PowerCenter Integration Service Architecture
  20. High Availability for the PowerCenter Integration Service
  21. PowerCenter Repository Service
  22. PowerCenter Repository Management
  23. PowerExchange Listener Service
  24. PowerExchange Logger Service
  25. SAP BW Service
  26. Search Service
  27. System Services
  28. Test Data Manager Service
  29. Test Data Warehouse Service
  30. Web Services Hub
  31. Application Service Upgrade
  32. Application Service Databases
  33. Connecting to Databases from Windows
  34. Connecting to Databases from UNIX or Linux
  35. Updating the DynamicSections Parameter of a DB2 Database

One Thread for Each Pipeline Stage

One Thread for Each Pipeline Stage

When maximum parallelism is set to 1, partitioning is disabled. The Data Integration Service separates a mapping into pipeline stages and uses one reader thread, one transformation thread, and one writer thread to process each stage.
Each mapping contains one or more pipelines. A pipeline consists of a Read transformation and all the transformations that receive data from that Read transformation. The Data Integration Service separates a mapping pipeline into pipeline stages and then performs the extract, transformation, and load for each pipeline stage in parallel.
Partition points mark the boundaries in a pipeline and divide the pipeline into stages. For every mapping pipeline, the Data Integration Service adds a partition point after the Read transformation and before the Write transformation to create multiple pipeline stages.
Each pipeline stage runs in one of the following threads:
  • Reader thread that controls how the Data Integration Service extracts data from the source.
  • Transformation thread that controls how the Data Integration Service processes data in the pipeline.
  • Writer thread that controls how the Data Integration Service loads data to the target.
The following figure shows a mapping separated into a reader pipeline stage, a transformation pipeline stage, and a writer pipeline stage:
The source and target are partition points. The reader pipeline stage contains a source, the transformation pipeline stage contains a Filter and an Expression transformation, and the writer pipeline stage contains the target.
Because the pipeline contains three stages, the Data Integration Service can process three sets of rows concurrently and optimize mapping performance. For example, while the reader thread processes the third row set, the transformation thread processes the second row set, and the writer thread processes the first row set.
The following table shows how multiple threads can concurrently process three sets of rows:
Reader Thread
Transformation Thread
Writer Thread
Row Set 1
-
-
Row Set 2
Row Set 1
-
Row Set 3
Row Set 2
Row Set 1
Row Set 4
Row Set 3
Row Set 2
Row Set n
Row Set (n-1)
Row Set (n-2)
If the mapping pipeline contains transformations that perform complicated calculations, processing the transformation pipeline stage can take a long time. To optimize performance, the Data Integration Service adds partition points before some transformations to create an additional transformation pipeline stage.