Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Catalog Service
  4. Content Management Service
  5. Data Integration Service
  6. Data Integration Service Architecture
  7. Data Integration Service Management
  8. Data Integration Service Grid
  9. Data Integration Service REST API
  10. Data Integration Service Applications
  11. Enterprise Data Preparation Service
  12. Interactive Data Preparation Service
  13. Informatica Cluster Service
  14. Mass Ingestion Service
  15. Metadata Access Service
  16. Metadata Manager Service
  17. Model Repository Service
  18. PowerCenter Integration Service
  19. PowerCenter Integration Service Architecture
  20. High Availability for the PowerCenter Integration Service
  21. PowerCenter Repository Service
  22. PowerCenter Repository Management
  23. PowerExchange Listener Service
  24. PowerExchange Logger Service
  25. SAP BW Service
  26. Search Service
  27. System Services
  28. Test Data Manager Service
  29. Test Data Warehouse Service
  30. Web Services Hub
  31. Application Service Upgrade
  32. Appendix A: Application Service Databases
  33. Appendix B: Connecting to Databases from Windows
  34. Appendix C: Connecting to Databases from UNIX or Linux
  35. Appendix D: Updating the DynamicSections Parameter of a DB2 Database

Application Service Guide

Application Service Guide

Maximize Parallelism for Mappings and Profiles

Maximize Parallelism for Mappings and Profiles

If you have the partitioning option, you can enable the Data Integration Service to maximize parallelism when it runs mappings, runs column profiles, or performs data domain discovery. When you maximize parallelism, the Data Integration Service dynamically divides the underlying data into partitions and processes all of the partitions concurrently.
When you run a profile job, the Data Integration Service converts the profile job into one or more mappings, and then can run those mappings in multiple partitions.
If mappings process large data sets or contain transformations that perform complicated calculations, the mappings can take a long time to process and can cause low data throughput. When you enable partitioning for these mappings, the Data Integration Service uses additional threads to process the mapping. Increasing the number of processing threads increases the load on the node where the mapping runs. If the node contains sufficient CPU bandwidth, concurrently processing rows of data in a mapping can optimize mapping performance.
By default, the
Maximum Parallelism
property is set to 1 for the Data Integration Service. When the Data Integration Service runs a mapping, it separates the mapping into pipeline stages and uses one thread to process each stage. These threads are allocated to reading, transforming, and writing tasks, and they run in parallel.
When you increase the maximum parallelism value, you enable partitioning. The Data Integration Service uses multiple threads to process each pipeline stage.
The Data Integration Service can create partitions for mappings that have physical data as input and output. The Data Integration Service can use multiple partitions to complete the following actions during a mapping run:
  • Read from flat file, IBM DB2 for LUW, or Oracle sources.
  • Run transformations.
  • Write to flat file, IBM DB2 for LUW, or Oracle targets.

0 COMMENTS

We’d like to hear from you!