Table of Contents

Search

  1. Abstract
  2. Supported Versions
  3. Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Hadoop Cluster Hardware Recommendations

Hadoop Cluster Hardware Recommendations

The following table lists the minimum and optimal hardware requirements for the Hadoop cluster:
Hardware
Sandbox Deployment
Basic or Standard Deployment
Advanced Deployment
CPU speed
2 - 2.5 GHz
2 - 2.5 GHz
2.5 - 3.5 GHz
Logical or virtual CPU cores
16
24 - 32
48
Total system memory
16 GB
64 GB
128 GB
Local disk space for yarn.nodemanager.local-dirs
1
256 GB
500 GB
2.4 TB
DFS block size
128 MB
256 MB
256 MB
HDFS replication factor
3
3
3
Disk capacity
32 GB
256 GB - 1 TB
1.2 TB
Total number of disks for HDFS
2
8
12
Total HDFS capacity per node
64 GB
2 - 8 TB
At least 14 TB
Number of nodes
2 +
4 - 10+
12 +
Total HDFS capacity on the cluster
128 GB
8 - 80 TB
144 TB
Actual HDFS capacity (with replication)
43 GB
2.66 TB
57.6 TB
/tmp mount point
20 GB
20 GB
30 GB
Installation disk space requirement
12 GB
12 GB
12 GB
Network bandwidth (Ethernet card)
1 Gbps
2 Gbps (bonded channel)
10 Gbps (Ethernet card)
1
A property in the yarn-site.xml that contains a list of directories to store localized files. You can find the localized file directory in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}
. You can find the work directories of individual containers,
container_${contid}
, as the subdirectories of the localized file directory.

MapR Cluster Recommendation

When you run mappings on the Blaze, Spark, or Hive engine, local cache files are generated under the directory specified in the yarn.nodemanager.local-dirs property in the yarn-site.xml. However, the directory might not contain sufficient disk capacity on a MapR cluster.
To make sure that the directory has sufficient disk capacity, perform the following steps:
  1. Create a volume on HDFS.
  2. Mount the volume through NFS.
  3. Configure the NFS mount location in yarn.nodemanager.local-dirs.
For more information, refer to the MapR documentation.

0 COMMENTS

We’d like to hear from you!