Table of Contents


  1. Preface
  2. Advanced clusters
  3. Setting up AWS
  4. Setting up Google Cloud
  5. Setting up Microsoft Azure
  6. Setting up a self-service cluster
  7. Setting up a local cluster
  8. Advanced configurations
  9. Troubleshooting
  10. Appendix A: Command reference

Advanced Clusters

Advanced Clusters

Resource requirements example

Resource requirements example

You have an
advanced cluster
with one worker node. The worker node has 16 GB of memory and 4 CPUs.
If you run an
advanced job
using the default requirements, the job fails. The Kubernetes system and the Spark shuffle service reserve 3 GB and 2 CPUs, so the cluster has a remaining 13 GB and 2 CPUs to run jobs. The job cannot run because the cluster requires 10 GB of memory and 2.25 CPUs to start the Spark driver and Spark executor.
If you cannot provision a larger instance type, you can reduce the CPU requirement by setting the following advanced session property in the mapping task:


When the number of Spark executor cores is 1, the Spark executor requires only 0.75 CPUs instead of 1.5 CPUs.
If you process a small amount of data, the Spark driver and executor require only a few hundred MB, so you might consider reducing the memory requirements for the driver and executor as well. You can reduce the requirements in the following way:



After you reconfigure the resource requirements, the cluster must have at least 5 GB of memory and 3.5 CPUs. One worker node with 16 GB and 4 CPUs fulfills the requirements to run the job successfully.


We’d like to hear from you!