Table of Contents

Search

  1. Abstract
  2. Supported Versions
  3. Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Data Integration Service

Data Integration Service

You can configure the maximum heap size and the batch execution pool sizes for the Hadoop and native environments.
The following table lists the recommended values for the Data Integration Service parameters:
Parameter
Sandbox Deployment
Basic Deployment
Standard Deployment
Advanced Deployment
Max heap size
1 GB (default)
6 GB
8 GB
16 GB
Execution pool size for the Hadoop environment
100
1000
2000
5000
Execution pool size for the native environment
10
10
15
30
The number of concurrent pushdown jobs submitted to the Data Integration Service determine the heap size and the execution pool size.
If you implement CI/CD, use a maximum heap size of 8 GB to scale CI/CD operations for up to 1,000 objects.

Best Practices for Handling Highly Concurrent Workloads

If you tend to run highly concurrent workloads (over ~5000 concurrent jobs), consider the following best practices:
  • Create a separate Model repository for persisting monitoring statistics.
  • If the workloads include Sqoop sources, set the custom property ExecutionContextOptions.SqoopPoolSize on the Data Integration Service.
    Custom Property
    Description
    ExecutionContextOptions.SqoopPoolSize
    Number of concurrent Sqoop jobs.
    If you set the ExecutionContextOptions.SqoopPoolSize to a value of -1, the concurrency for Sqoop is determined by the value set for the Maximum Hadoop Batch Pool Size property.
    If you want to restrict the number of Sqoop jobs that run in parallel, you can set ExecutionContextOptions.SqoopPoolSize to a value between 0 and 100. The value you specify controls the concurrency of Sqoop jobs.
    ​​Default is 100. Recommended value is -1.
    For more information about this property, see the "Sqoop Concurency" section and KB article 570014.
  • Because an increase in the number of concurrent jobs increases monitoring activity, you can also set the following custom property on the Data Integration Service Process custom properties to increase the buffer size:
    Custom Property
    Description
    Default
    Recommendation
    MonitoringOptions.StatsBufferSize
    Size of the monitoring statistics buffer, measured in number of jobs
    10000
    25000
Use the infacmd gateway service to run highly concurrent pushdown mappings.
The gateway service is a single-client Java process that intercepts requests from the infacmd client. The gateway service submits the requests as mappings or workflows to the Data Integration Service using threads to limit system resource consumption.
Configure the infacmd gateway service on a remote machine other than the Data Integration Service host machine. Configure the following properties:
Parameter
Sandbox Deployment
Basic Deployment
Standard Deployment
Advanced Deployment
CPU cores
1
1
2
4
Memory
512 MB
1 GB
2 GB
4 GB
To enable and configure the gateway service, edit the properties file in the following location:
<Informatica_installation_directory>/isp/bin/plugins/ms/clientgateway/msgatewayconfig.properties
In the properties file, configure the following property:
Property
Value
enable_client_gateway
true
For details on configuring the gateway service, refer to the following article:

Additional Guidelines

Consider the following additional guidelines for the Data Integration Service:
  • Application deployment requires communication between the Data Integration Service and the associated Model Repository Service. To fetch objects and to write to the database schema of the Model Repository Service, tune the database cursors as follows:
    Number of database cursors >= Number of objects in the application
  • To run jobs in the native environment and to preview data, the Data Integration Service requires at least one physical core for each job execution.
  • If the Data Integration Service is enabled to use multiple partitions for native jobs, the Data Integration Service node resource requirements increase based on the parallelism. If the number of jobs in the native environment are typically high, you must allocate additional resources for other jobs.

Profiling Parameters

To optimize performance, perform profiling using the Blaze engine. Tuning profiling performance involves configuring the Data Integration Service parameters, the profile database warehouse properties, and the advanced profiling properties.

0 COMMENTS

We’d like to hear from you!