Multiple partitions can improve session performance by dividing data into multiple threads and then processing the threads in parallel. Processing data in multiple threads decreases data volume and increases data throughput.
For example, a simple mapping with a single partition and a source, transformation, and target processes 100 million rows of data. Due to an input/output data bottleneck, the session performance is poor. By increasing the number of partitions in the pipeline, you can increase performance by processing smaller amounts of data concurrently.
When you add
n
partitions to a pipeline, you decrease the session run time by a factor of
n
. For example, a session with one partition processes one thread containing 100 million rows of data in 30 minutes. If you change the number of partitions to four, the session processes four threads, each with 25 million rows of data, in seven to eight minutes.
When you add partitions, performance increases as long as the hardware has sufficient resources to process the data.
When you increase the number of partitions in a session, you may still experience decreased performance if each partition reads data from the same source location. To gain optimal performance from a partitioned session, distribute the data across multiple database partitions or disks. If you are using key-range partitioning, the source database must also be partitioned. Otherwise, adding partitions to the pipeline will not improve session performance.