Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Content Management Service
  4. Data Integration Service
  5. Data Integration Service Architecture
  6. Data Integration Service Management
  7. Data Integration Service Grid
  8. Data Integration Service Applications
  9. Mass Ingestion Service
  10. Metadata Access Service
  11. Metadata Manager Service
  12. Model Repository Service
  13. PowerCenter Integration Service
  14. PowerCenter Integration Service Architecture
  15. High Availability for the PowerCenter Integration Service
  16. PowerCenter Repository Service
  17. PowerCenter Repository Management
  18. PowerExchange Listener Service
  19. PowerExchange Logger Service
  20. SAP BW Service
  21. Search Service
  22. System Services
  23. Test Data Manager Service
  24. Test Data Warehouse Service
  25. Web Services Hub
  26. Application Service Upgrade
  27. Application Service Databases
  28. Connecting to Databases from Windows
  29. Connecting to Databases from UNIX
  30. Updating the DynamicSections Parameter of a DB2 Database

Maximize Parallelism for Mappings and Profiles

Maximize Parallelism for Mappings and Profiles

If you have the partitioning option, you can enable the Data Integration Service to maximize parallelism when it runs mappings, runs column profiles, or performs data domain discovery. When you maximize parallelism, the Data Integration Service dynamically divides the underlying data into partitions and processes all of the partitions concurrently.
When you run a profile job, the Data Integration Service converts the profile job into one or more mappings, and then can run those mappings in multiple partitions.
If mappings process large data sets or contain transformations that perform complicated calculations, the mappings can take a long time to process and can cause low data throughput. When you enable partitioning for these mappings, the Data Integration Service uses additional threads to process the mapping. Increasing the number of processing threads increases the load on the node where the mapping runs. If the node contains sufficient CPU bandwidth, concurrently processing rows of data in a mapping can optimize mapping performance.
By default, the
Maximum Parallelism
property is set to 1 for the Data Integration Service. When the Data Integration Service runs a mapping, it separates the mapping into pipeline stages and uses one thread to process each stage. These threads are allocated to reading, transforming, and writing tasks, and they run in parallel.
When you increase the maximum parallelism value, you enable partitioning. The Data Integration Service uses multiple threads to process each pipeline stage.
The Data Integration Service can create partitions for mappings that have physical data as input and output. The Data Integration Service can use multiple partitions to complete the following actions during a mapping run:
  • Read from flat file, IBM DB2 for LUW, or Oracle sources.
  • Run transformations.
  • Write to flat file, IBM DB2 for LUW, or Oracle targets.