Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Content Management Service
  4. Data Integration Service
  5. Data Integration Service Architecture
  6. Data Integration Service Management
  7. Data Integration Service Grid
  8. Data Integration Service Applications
  9. Metadata Manager Service
  10. Model Repository Service
  11. PowerCenter Integration Service
  12. PowerCenter Integration Service Architecture
  13. High Availability for the PowerCenter Integration Service
  14. PowerCenter Repository Service
  15. PowerCenter Repository Management
  16. PowerExchange Listener Service
  17. PowerExchange Logger Service
  18. SAP BW Service
  19. Search Service
  20. System Services
  21. Test Data Manager Service
  22. Web Services Hub
  23. Application Service Upgrade
  24. Application Service Databases
  25. Connecting to Databases from Windows
  26. Connecting to Databases from UNIX
  27. Updating the DynamicSections Parameter of a DB2 Database

Maximize Parallelism for Mappings and Profiles

Maximize Parallelism for Mappings and Profiles

If you have the partitioning option, you can enable the Data Integration Service to maximize parallelism when it runs mappings, runs column profiles, or performs data domain discovery. When you maximize parallelism, the Data Integration Service dynamically divides the underlying data into partitions and processes all of the partitions concurrently.
When you run a profile job, the Data Integration Service converts the profile job into one or more mappings, and then can run those mappings in multiple partitions.
If mappings process large data sets or contain transformations that perform complicated calculations, the mappings can take a long time to process and can cause low data throughput. When you enable partitioning for these mappings, the Data Integration Service uses additional threads to process the mapping. Increasing the number of processing threads increases the load on the node where the mapping runs. If the node contains sufficient CPU bandwidth, concurrently processing rows of data in a mapping can optimize mapping performance.
By default, the
Maximum Parallelism
property is set to 1 for the Data Integration Service. When the Data Integration Service runs a mapping, it separates the mapping into pipeline stages and uses one thread to process each stage. These threads are allocated to reading, transforming, and writing tasks, and they run in parallel.
When you increase the maximum parallelism value, you enable partitioning. The Data Integration Service uses multiple threads to process each pipeline stage.
The Data Integration Service can create partitions for mappings that have physical data as input and output. The Data Integration Service can use multiple partitions to complete the following actions during a mapping run:
  • Read from flat file, IBM DB2 for LUW, or Oracle sources.
  • Run transformations.
  • Write to flat file, IBM DB2 for LUW, or Oracle targets.