Table of Contents

Search

  1. Abstract
  2. Supported Versions
  3. Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Case Study: Data Integration Service Concurrency with Multiple HS2 Load Balancers

Case Study: Data Integration Service Concurrency with Multiple HS2 Load Balancers

The following case study shows the benefits of having multiple Hive Server2 Load Balancers for large concurrent mappings running on a Data Integration Service 4 node grid.
The mappings used TPC-DS benchmark queries of medium complexity with Hive sources and parameterized HDFS targets. During the test, peak CPU utilization was ~20 cores (80%) for less than 5 minutes. The average utilization was ~4 cores.

Environment

Cloudera Cluster
Data Integration Service
4 Node Grid
Chipset
Intel® Xeon® Processor X5675 @ 3.06 GHz
Intel® Xeon® Gold 6132 CPU @ 2.60GHz
Cores
4 x 6 cores
4 x 14 cores
Memory
32 GB
125 GB
Operating System
Red Hat Enterprise Linux 6.1
Red Hat Enterprise Linux 7.5 (Maipo)
Hadoop Distribution
Cloudera 5.15
-
Hadoop Cluster Size
25 nodes
-

Hive Server 2 Load Balancer Configuration

Test staff configured the HiveServer 2 load balancers using the following steps:
  • Install the HA Proxy package or another load balancer recommended by your IT team.
  • Configure the HA proxy service to listen on port 10000 and include the HS2 instances.
  • Configure the HA Proxy service to start on bootup.
  • In Cloudera Manager, include the Load Balancer server address in the HiveServer2 Load Balancer configuration properties.
  • Restart the Hive service.

Performance Chart

The following performance chart compares the dispatch times for 10K concurrent jobs on a Hadoop cluster. Cluster Dispatch time is the time taken by the Data Integration Service to submit all mappings to the cluster:

Conclusions

The test found that dispatch time improved ~40% with two HiveServer 2 instances.

0 COMMENTS

We’d like to hear from you!