Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Back Next

Data Integration Service

You can configure the maximum heap size and the batch execution pool sizes for the Hadoop and native environments.

The following table lists the recommended values for the Data Integration Service parameters:

Parameter	Sandbox Deployment	Basic Deployment	Standard Deployment	Advanced Deployment
Max heap size	1 GB (default)	6 GB	8 GB	16 GB
Execution pool size for the Hadoop environment	100	1000	2000	5000
Execution pool size for the native environment	10	10	15	30

The number of concurrent pushdown jobs submitted to the Data Integration Service determine the heap size and the execution pool size.

Best Practices for Handling Highly Concurrent Workloads

If you tend to run highly concurrent workloads (over ~5000 concurrent jobs), consider the following best practices:

Create a separate Model repository for persisting monitoring statistics.

If the workloads include Sqoop sources, set the custom property ExecutionContextOptions.SqoopPoolSize on the Data Integration Service.

Custom Property	Description	Default	Recommendation
ExecutionContextOptions.SqoopPoolSize	Number of concurrent Sqoop jobs. If you set the ExecutionContextOptions.SqoopPoolSize to a value of -1, the concurrency for Sqoop is determined by the value set for the Maximum Hadoop Batch Pool Size property. If you want to restrict the number of Sqoop jobs that run in parallel, you can set ExecutionContextOptions.SqoopPoolSize to a value between 0 and 100. The value you specify controls the concurrency of Sqoop jobs.	100	-1

For more information about this property, see the "Sqoop Concurency" section and KB article 570014.

Because an increase in the number of concurrent jobs increases monitoring activity, you can also set the following custom property on the Data Integration Service Process custom properties to increase the buffer size:

Custom Property	Description	Default	Recommendation
MonitoringOptions.StatsBufferSize	Size of the monitoring statistics buffer, measured in number of jobs	10000	25000

Use the infacmd gateway service to run highly concurrent pushdown mappings.: The gateway service is a single-client Java process that intercepts requests from the infacmd client. The gateway service submits the requests as mappings or workflows to the Data Integration Service using threads to limit system resource consumption.

Configure the infacmd gateway service on a remote machine other than the Data Integration Service host machine. Configure the following properties:

Parameter
Sandbox Deployment
Basic Deployment
Standard Deployment
Advanced Deployment

CPU cores
1
1
2
4

Memory
512 MB
1 GB
2 GB
4 GB

To enable and configure the gateway service, edit the properties file in the following location:
<Informatica_installation_directory>/isp/bin/plugins/ms/clientgateway/msgatewayconfig.properties

In the properties file, configure the following property:

Property
Value

enable_client_gateway
true

For details on configuring the gateway service, refer to the following article:

Gateway Service to Submit Mappings and Workflows to the Data Integration Service

Parameter	Sandbox Deployment	Basic Deployment	Standard Deployment	Advanced Deployment
CPU cores	1	1	2	4
Memory	512 MB	1 GB	2 GB	4 GB

Property	Value
enable_client_gateway	true

Additional Guidelines

Consider the following additional guidelines for the Data Integration Service:

Application deployment requires communication between the Data Integration Service and the associated Model Repository Service. To fetch objects and to write to the database schema of the Model Repository Service, tune the database cursors as follows:

Number of database cursors >= Number of objects in the application

To run jobs in the native environment and to preview data, the Data Integration Service requires at least one physical core for each job execution.

If the Data Integration Service is enabled to use multiple partitions for native jobs, the Data Integration Service node resource requirements increase based on the parallelism. If the number of jobs in the native environment are typically high, you must allocate additional resources for other jobs.

Profiling Parameters

To optimize performance, perform profiling using the Blaze engine. Tuning profiling performance involves configuring the Data Integration Service parameters, the profile database warehouse properties, and the advanced profiling properties.

Rename Saved Search

Table of Contents

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Data Integration Service

Data Integration Service

Best Practices for Handling Highly Concurrent Workloads

Additional Guidelines

Profiling Parameters