If you have the partitioning option, administrators can enable the Data Integration Service to maximize parallelism when it runs mappings. When administrators maximize parallelism, the Data Integration Service dynamically divides the underlying data into partitions and processes all of the partitions concurrently.
If mappings process large data sets or contain transformations that perform complicated calculations, the mappings can take a long time to process and can cause low data throughput. When you enable partitioning for these mappings, the Data Integration Service uses additional threads to process the mapping which can optimize performance.
To enable partitioning, administrators and developers perform the following tasks:
Administrators set maximum parallelism for the Data Integration Service to a value greater than 1 in the Administrator tool.
Maximum parallelism determines the maximum number of parallel threads that process a single pipeline stage. Administrators increase the
Maximum Parallelism
property value based on the number of CPUs available on the nodes where mappings run.
Optionally, developers can set a maximum parallelism value for a mapping in the Developer tool.
By default, the
Maximum Parallelism
property for each mapping is set to Auto. Each mapping uses the maximum parallelism value defined for the Data Integration Service.
Developers can change the maximum parallelism value in the mapping run-time properties to define a maximum value for a particular mapping. When maximum parallelism is set to different integer values for the Data Integration Service and the mapping, the Data Integration Service uses the minimum value of the two.
When partitioning is disabled for a mapping, the Data Integration Service separates the mapping into pipeline stages and uses one thread to process each stage.
When partitioning is enabled for a mapping, the Data Integration Service uses multiple threads to process each mapping pipeline stage.
The Data Integration Service can create partitions for mappings that have physical data as input and output. The Data Integration Service can use multiple partitions to complete the following actions during a mapping run:
Read from flat file, IBM DB2 for LUW, or Oracle sources.
Run transformations.
Write to flat file, IBM DB2 for LUW, or Oracle targets.