Performance Tuning Guide

10.3

Back Next

Recommendations for Batch Job Optimization

A batch job is a program in the MDM Hub that you can run to complete a discrete unit of work. You can launch batch jobs individually or as a group from the Hub Console or with the SIF APIs. You can configure settings to optimize the performance of batch jobs.

The following table lists the different batch job parameters and their recommended settings to achieve a base-level performance:

Parameter	Recommended Setting	Description
Cleanse Thread Count Used in the following batch jobs: Match Job Generate Match Tokens process on Load job Stage job	Start with the number of cores available. Based on CPU utilization, number of threads can be increased. Default is 1.	Available in “Process Server Threads for Cleanse Processing”. Total number of threads used by the Master or Slave Process Server when executing. Generate Match Tokens after Load, Match, and Stage jobs.
Threads for Batch Processing Used in the following batch jobs: Automerge Job Load Job Batch Delete Batch Unmerge Batch Revalidate	Specify a value that is equivalent to four times the number of CPU cores on the system on which the Process Server is deployed. Default is 20.	Available in “Process Server Threads for Batch Processing”. Maximum number of threads to use for a batch process. For example, if the host machine has 16 CPU cores, set the Threads for Batch Processing in the Process Server registration to 64. Applicable only if the Process Server is marked for batch processing. From the total number of threads available on the Process Server, dedicate n threads for Batch jobs by setting a value for the property number of threads for Batch processing.
Controller Thread Time Out Used in the following batch jobs: Automerge Job Load Job Batch Delete Batch Unmerge Batch Recalculate	300000 (5 minutes). Default is 300000.	com.informatica.mdm.loadbalance.ControllerThread.timeout This property is found in the cmxcleanse.properties file. When distributing the load to different slave Process Servers, after the last block is sent to a slave Process Server, all slave Process Servers which are processing the blocks MUST complete the job within the timeout period. If not completed, such blocks are marked with ‘No Action’ in the batch result. Note that, the batch is not marked as failed because the remaining blocks are successfully loaded.
Load analyze threshold rate Used in the following batch jobs: Automerge Job Load Job Batch Delete Batch Unmerge Batch Recalculate	Default is 10.	cmx.server.batch.load.analyze_threshold_rate Available in cmxserver.properties For ORACLE only. Available from MDM 10.0 HotFix 1. Specifies the frequency that the MDM Hub gathers analytical statistics for tables affected by a batch Load job. Set to 0 to disable statistic collection. Set to 1 to collect statistics only at the end of a Load job for base object and cross-reference tables. For example, if the threshold is 10, then statistics would be gathered at every 10^n records. For example, new statistics would be gathered whenever the insert record count reaches 100, 1000, 10000, and so on.
Recycler Thread Max Idling Used in the following batch jobs: Automerge Job Load Job Batch Delete Batch Unmerge Batch Recalculate	300000 (5 minutes). Default is 300000 (5 minutes).	com. informatica.mdm.batchserver.RecyclerThread.max_idling This property is found in the cmxcleanse.properties file. If a slave Process Server is processing a block of batch job and is idle for a duration specified in this attribute then the specific thread is marked as 'dead.' If a slave Process Server is timed out as noted earlier, the corresponding block is marked with ‘No Action’ in the batch result. Note that the batch is not marked as failed as the remaining blocks are successfully loaded.
Automerge : Automerge Threads Per Job	Default is 1.	cmx.server.automerge.threads_per_job This property is found in the cmxserver.properties file. Maximum number of threads distributed across different Process Servers to process the automerge job. For example, if this value is 20, automerge would be distributed across two Process Servers each with 10. The distribution depends on factors such as CPU weightage of the Process Server and other jobs running on the Process Server. This value must be less than the value in 'Threads for Batch' attribute specified for the Process Server. The optimum value for a database server with a 16 core processor and a solid-state drive (SSD) set up in a RAID is 20. Based on CPU utilization on different Process Servers, you can increase the threads.
Automerge : Automerge Block Size	Default is 250.	cmx.server.automerge.block_size This property is found in the cmxserver.properties file. Maximum number of records to be sent for merges to each Process Server in one block. For example, consider the scenario of two Process Servers with 1000 records to be merged. If this value is 250, each Process Server gets 250 records first followed by another 250 records next. Increasing this value can provide performance improvement based on how powerful the application servers and database servers are.
Load : Batch Threads Per Job	Default is 1.	cmx.server.batch.threads_per_job This property is found in the cmxserver.properties file. Maximum number of threads distributed across different Process Servers to process the load job. For example, if this value is 20 then load process would be distributed across two Process Servers each with 10. The distribution depends on factors such as CPU weightage of the Process Server and other jobs running on the Process Server. This value must be less than the value in 'Threads for Batch' attribute specified for the Process Server. The optimum value for a database server with a 16 core processor and a solid-state drive (SSD) set up in a redundant array of independent disks (RAID) is 20. Based on CPU utilization on different Process Servers, you can increase the threads.
Load : Batch Block Size	Default is 250.	cmx.server.batch.load.block_size This property is found in the cmxserver.properties file. Maximum number of records to be sent for load, to each Process Server in one block. For example, consider the scenario of two Process Servers with 1000 records to be loaded. If this value is 250, each Process Server gets 250 records first followed by another 250 records next. Increasing this value can provide performance improvement based on how powerful the application servers and database servers are.
Load : Threads per job for generate tokens, if 'Generate Match Tokens on Load' attribute is enabled on the base object	Same as "Threads for cleanse processing".	See 'Threads for Cleanse Processing' attribute described earlier. Note that, this thread attribute is different from the core threads per job attribute of the load job described earlier. If 'Generate Match Tokens on Load' is not selected, this attribute does not have any impact on the performance of the Load job.
Batch Recalculate (SIF API Request) : Recalculate Threads Per Job	Same property, re-used from LOAD Job. See LOAD Job section for more details.	cmx.server.batch.threads_per_job This property is found in the cmxserver.properties file. Same property, re-used from LOAD Job. See LOAD Job section for more details.
Batch Recalculate (SIF API Request) : Recalculate Block Size	Default is 250.	cmx.server.batch.recalculate.block_size This property is found in the cmxserver.properties file. Maximum number of records to be sent, to recalculate BVT, to each Process Server in one block. For example, consider the scenario of two Process Servers with 1000 records to be recalculated. If this value is 250, each Process Server gets 250 records first followed by another 250 records next. Increasing this value can provide performance improvement based on how powerful the application servers and database servers are.
Batch Recalculate (SIF API Request): Threads Per Job	Same property, re-used from LOAD Job. Refer to LOAD Job section for more details.	cmx.server.batch.threads_per_job Available in cmxserver.properties Same property, re-used from LOAD Job. Refer to LOAD Job section for more details.
Batch Unmerge (SIF API Request) : Unmerge Block Size	Default is 250.	cmx.server.batch.batchunmerge.block_size This property is found in the cmxserver.properties file. Maximum number of records to be sent for unmerges, to each Process Server in one block. For example, consider the scenario of two Process Servers with 1000 records to be unmerged. If this value is 250, each Process Server gets 250 records first followed by another 250 records next. Increasing this value can provide performance improvement based on how powerful the application servers and database servers are.
Batch Delete (SIF API Request) : Threads per job	Same property, re-used from LOAD Job. See LOAD Job section for more details.	cmx.server.batch.threads_per_job This property is found in the cmxserver.properties file. Same property, re-used from LOAD Job. See LOAD Job section for more details.
Batch Delete (SIF API Request) : Delete Batch Block Size	Default is 250.	cmx.server.batch.delete.block_size This property is found in the cmxserver.properties file. Maximum number of records to be sent for deletion, to each Process Server in one block. For example, consider the scenario of two Process Servers with 1000 records to be deleted. If this value is 250, each Process Server first gets 250 records and then another 250 records. Increasing this value can provide performance improvement. This performance improvement depends on how powerful the application servers and database servers are.
Tokenize : Tokenization File Loader Option	Default is true.	cmx.server.tokenize.file_load This property is found in the cmxcleanse.properties file. Applicable for Oracle and DB2. If true, DB2 file loader or Oracle SQL Loader is used to load the records during the tokenization job. If file writing is causing performance issue, this can be changed to false, thereby, data is directly written to the database every time instead of file loader option. Generally, file loader is faster than the direct database write. You might choose the option according to your environment.
Stage : Threads per job	See 'Cleanse Thread Count' attribute described earlier.	See 'Cleanse Thread Count' attribute described earlier.
Stage : Cleanse Minimum Distribution	Default is 1000.	cmx.server.cleanse.min_size_for_distribution This property is found in the cmxcleanse.properties file. The MDM Hub distributes the cleanse job across different cleanse server only if the number of records is higher than this minimum size. When distributing the load, each slave Process Server would use the Cleanse Thread Count for the number of worker threads.
Stage : Stage JDBC Loader	Default is false. Usually, file writing must be faster than the direct database writing.	cmx.server.java_jdbc_loader Applicable for Oracle and DB2. Default is false. This property is found in the cmxcleanse.properties file. If true, DB2 and Oracle use direct database connections during the stage job instead of DB2 file loader or Oracle SQL loader options If file writing is causing performance issue, this can be changed to true. On doing so, data gets directly written to the database every time instead of file loader option. Note that, generally, file loader is faster than the direct database write. You might choose the option according to your environment.
Match : Threads per job	See 'Cleanse Thread Count' attribute described earlier.	See 'Cleanse Thread Count' attribute described earlier.
Match : Match Distribution Flag	Enable this flag to 1, if the MDM Hub has to distribute the match job load across different cleanse servers.	cmx.server.match.distributed_match This property is found in the cmxcleanse.properties file. The MDM Hub distributes the match job across different cleanse server only if this value is set to 1. When distributing the load, each slave Process Server would use the Cleanse Thread Count for the number of worker threads.
Match : Match File Loader Option	Default is true. Usually, file writing must be faster than the direct database writing.	cmx.server.match.file_load Applicable for Oracle and DB2. Default is true. This property is found in the cmxcleanse.properties file. If true, DB2 file loader or Oracle SQL Loader is used to load the records during the tokenization job. If file writing is causing performance issue, this can be changed to false, thereby, data will be directly written to the database every time instead of file loader option. Generally, file loader is faster than the direct database write. You might choose the option according to your environment.
Match : Match Loader Batch Size	Default is 250.	cmx.server.match.loader_batch_size This property is found in the cmxcleanse.properties file. Applicable if JDBC load is used in match processing instead of file loader option. Maximum number of records to be sent for match in each worker thread. Increasing this value can provide performance improvement based on how powerful the application servers and database servers are.
Match : Match Elapsed Time	Default is 20 (minutes).	Hub Console Base Object Max Elapsed Match Minutes. The execution timeout in minutes when executing a match rule. If this time is reached, the match process will exit. This must be increased only if the match rule and the data are very complex. Generally rules must be able to complete within 20 minutes.
Match : Match Batch Size	Default is 20000000.	Hub Console Base Object Match/Merge Setup Number of rows per match job batch cycle. Maximum number of records to be processed by the MDM Hub for matching. This number would affect the duration of match process. Also, lower the match batch size, you have to run the match process more times. When running large Match jobs with large match batch sizes, if there is a failure of the application server or the database, you must re-run the entire batch.
Match : Maximum records per ranger node	Default is 5000.	max_records_per_ranger_node This property is found in the cmxcleanse.properties file. Number of records per match ranger node (limits memory use). Ranger is an internal component used within the match process where sorting and merging operations are performed based on this maximum records attribute. You can optimize this value to get better performance based on the memory available in your application server.
Initially Index Smart Search Data: Block Size	10000. Default is 250.	cmx.server.batch.smartsearch.initial.block_size Available in cmxserver.properties . Maximum number of records that the "Initially Index Smart Search Data" batch job can process in each block. This property is not applicable through regular indexing outside this specific batch job. When you index a large data set, you can set the value to 10000. This property is available only from MDM 10.0 Hot Fix 2.
Initially Index Smart Search Data: Smart search threads	Default is 1. Same property, re-used from LOAD Job.	cmx.server.batch.threads_per_job Available in cmxserver.properties . Maximum number of threads distributed across different Process Servers to process the batch job "Initially Index Smart Search Data". You can increase this value to achieve more performance during this batch job. This property is not applicable for regular indexing outside this specific batch job.

Rename Saved Search

Table of Contents

Performance Tuning Guide

Performance Tuning Guide

Recommendations for Batch Job Optimization

Recommendations for Batch Job Optimization