Table of Contents

Search

  1. Preface
  2. Understanding Pipeline Partitioning
  3. Partition Points
  4. Partition Types
  5. Pushdown Optimization
  6. Pushdown Optimization and Transformations
  7. Real-time Processing
  8. Commit Points
  9. Row Error Logging
  10. Workflow Recovery
  11. Stopping and Aborting
  12. Concurrent Workflows
  13. Grid Processing
  14. Load Balancer
  15. Workflow Variables
  16. Parameters and Variables in Sessions
  17. Parameter Files
  18. FastExport
  19. External Loading
  20. FTP
  21. Session Caches
  22. Incremental Aggregation
  23. Session Log Interface
  24. Understanding Buffer Memory
  25. High Precision Data

Advanced Workflow Guide

Advanced Workflow Guide

Partitioning Sorted Joiner Transformations

Partitioning Sorted Joiner Transformations

When you include a Joiner transformation that uses sorted input, you must verify the Joiner transformation receives sorted data. If the sources contain large amounts of data, you might want to configure partitioning to increase performance. However, partitions that redistribute rows can rearrange the order of sorted data, so it is important to configure partitions to maintain sorted data.
For example, when you use a hash auto-keys partition point, the Integration Service uses a hash function to determine the best way to distribute the data among the partitions. However, the Integration Service does not maintain the sort order, so you must follow specific partitioning guidelines to use this type of partition point.
When you join data, you can partition data for the master and the detail pipelines by configuring an equal number of partitions for the master and the detail sources. The Integration Service processes multiple partitions concurrently.
You might need to configure the partitions to maintain the sort order based on the type of partition you use at the Joiner transformation. If the Joiner transformation uses 1:n partitioning, and the master and detail pipelines are both joined on sorted ports, the session terminates unexpectedly.
Consider the following partitioning guidelines:
  • Using sorted flat files or sorted relational data.
    When you have one large flat file in the master and detail pipelines, configure partitions to pass all sorted data in the first partition, and pass empty file data in the other partitions.
  • Using the Sorter transformation.
    If you use a hash auto-keys partition at the Joiner transformation, configure each Sorter transformation to use hash auto-keys partition points as well.
Add only pass-through partition points between the sort origin and the Joiner transformation.

0 COMMENTS

We’d like to hear from you!