Table of Contents

Search

  1. Preface
  2. Mappings
  3. Mapplets
  4. Mapping Parameters
  5. Mapping Outputs
  6. Generate a Mapping from an SQL Query
  7. Dynamic Mappings
  8. How to Develop and Run a Dynamic Mapping
  9. Dynamic Mapping Use Cases
  10. Mapping Administration
  11. Export to PowerCenter
  12. Import From PowerCenter
  13. Performance Tuning
  14. Pushdown Optimization
  15. Partitioned Mappings
  16. Developer Tool Naming Conventions

Developer Mapping Guide

Developer Mapping Guide

One Thread for Each Pipeline Stage

One Thread for Each Pipeline Stage

When maximum parallelism is set to 1, partitioning is disabled. The Data Integration Service separates a mapping into pipeline stages and uses one thread to process each stage.
Each mapping contains one or more pipelines. A pipeline consists of a Read transformation and all the transformations that receive data from that Read transformation. The Data Integration Service separates a mapping pipeline into pipeline stages and then performs the extract, transformation, and load for each pipeline stage in parallel.
Partition points mark the boundaries in a pipeline and divide the pipeline into stages. For every mapping pipeline, the Data Integration Service adds a partition point after the Read transformation and before the Write transformation to create multiple pipeline stages.
Each pipeline stage runs in one of the following threads:
  • Reader thread that controls how the Data Integration Service extracts data from the source.
  • Transformation thread that controls how the Data Integration Service processes data in the pipeline.
  • Writer thread that controls how the Data Integration Service loads data to the target.
The following figure shows a mapping separated into a reader pipeline stage, a transformation pipeline stage, and a writer pipeline stage:
The source and target are partition points. The reader pipeline stage contains a source, the transformation pipeline stage contains a Filter and an Expression transformation, and the writer pipeline stage contains the target.
Because the pipeline contains three stages, the Data Integration Service can process three sets of rows concurrently and optimize mapping performance. For example, while the reader thread processes the third row set, the transformation thread processes the second row set, and the writer thread processes the first row set.
The following table shows how multiple threads can concurrently process three sets of rows:
Reader Thread
Transformation Thread
Writer Thread
Row Set 1
-
-
Row Set 2
Row Set 1
-
Row Set 3
Row Set 2
Row Set 1
Row Set 4
Row Set 3
Row Set 2
Row Set n
Row Set (n-1)
Row Set (n-2)
If the mapping pipeline contains transformations that perform complicated calculations, processing the transformation pipeline stage can take a long time. To optimize performance, the Data Integration Service adds partition points before some transformations to create an additional transformation pipeline stage.

0 COMMENTS

We’d like to hear from you!