Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Big Data Management User Guide

Big Data Management User Guide

Partition Optimization

Partition Optimization

You can optimize the partitioning of Model repository mappings to increase performance. You can add more partitions, select the best performing partition types, use more CPUs, and optimize the source or target database for partitioning.
To optimize partitioning, perform the following tasks:
Increase the number of partitions.
When you configure Model repository mappings, you increase the number of partitions when you increase the maximum parallelism value for the Data Integration Service or the mapping.
Increase the number of partitions to enable the Data Integration Service to create multiple connections to sources and process partitions of source data concurrently. Increasing the number of partitions increases the number of threads, which also increases the load on the Data Integration Service nodes. If the Data Integration Service node or nodes contain ample CPU bandwidth, processing rows of data concurrently can increase performance.
If you use a single-node Data Integration Service and the Data Integration Service uses a large number of partitions in a session or mapping that processes large amounts of data, you can overload the system.
Use multiple CPUs.
If you have a symmetric multi-processing (SMP) platform, you can use multiple CPUs to concurrently process partitions of data.
Optimize the source database for partitioning.
You can optimize the source database for partitioning. For example, you can tune the database, enable parallel queries, separate data into different tablespaces, and group sorted data.
Optimize the target database for partitioning.
You can optimize the target database for partitioning. For example, you can enable parallel inserts into the database, separate data into different tablespaces, and increase the maximum number of sessions allowed to the database.

0 COMMENTS

We’d like to hear from you!