Tuning the Hive Engine for Big Data Management®

Tuning the Hive Engine for Big Data Management®

YARN and MapReduce Settings for MapReduce Version 2 (MRv2)

YARN and MapReduce Settings for MapReduce Version 2 (MRv2)

You can tune parameters at the Hadoop cluster level to improve performance. MapReduce version 2 uses YARN which relies on multiple parameters to determine the number of parallel containers.
Based on internal tests, Informatica recommends the following formula to determine the number of containers:
M
p
= Number of Parallel Map Task Containers = min of (4 x number of disks, number of logical cores) R
p
= Number of Parallel Reduce Task Containers = min of (65% of number of logical cores, 3 x number of Disks)
After you determine the values for M
p
and R
p
, you can modify certain parameters to ensure that the Hadoop node can allocate the parallel containers. To modify the parameters in the site.xml files, open the Administrator tool, go to the Connections tab, and select the cluster configuration.
Configure the following properties in the yarn-site.xml:
yarn.nodemanager.resource.memory-mb
The amount of physical memory in MB that can be allocated for containers. Informatica recommends reserving some memory for other processes running in a node.
yarn.nodemanager.resource.cpu-vcores
The number of CPU cores that can be allocated for containers. Informatica recommends setting the value to the number of logical cores available in the node.
yarn.nodemanager.vmem-check-enabled
The virtual memory check is set to false by default. Retain the default value.
Configure the following properties in mapred-site.xml
mapreduce.map.memory.mb
Memory allocated to map task containers.
mapreduce.reduce.memory.mb
Memory allocated to reduce task containers.
The values arrived at using the formula can serve as a starting point and can be tweaked based on simple performance tests.
The following example shows how to configure the MRv2 YARN and MapReduce settings:
Number of logical cores = 24 Number of disks = 7 Amount of physical memory available for containers = 64 GB M
p
= min (4 X 7, 24) = min (28, 24) = 24 R
p
= min (0.65 X 24, 3 X 7) = min (15.6, 21) = 15.6 ≈ 16 yarn.nodemanager.resource.memory-mb = 65536 mapreduce.map.memory.mb = yarn.nodemanager.resource.memory-mb / M
p
= 65536 / 24 = 2730 mapreduce.reduce.memory.mb = yarn.nodemanager.resource.memory-mb / R
p
= 65536 / 16 = 4096
For more information about tuning the hardware and the Hadoop cluster, refer to the following Informatica How-To Library article:

0 COMMENTS

We’d like to hear from you!