Preface
Advanced clusters
Setting up AWS
Setting up Google Cloud
Setting up Microsoft Azure
Setting up a self-service cluster
Setting up a local cluster
Advanced configurations
Troubleshooting
Appendix A: Command reference

Advanced Clusters

Back Next

High availability

advanced cluster

can become highly available to eliminate a single point of failure when the master node goes down. If you enable high availability and one master node goes down, other master nodes will be available and jobs on the cluster can continue running.

When a cluster is highly available, watch out for job failures in the following scenarios:

If all master nodes go down, jobs will fail.

If too many master nodes go down, the Kubernetes API server becomes unavailable. The threshold for the number of failures is

(n+1)/2

where

is the number of master nodes. For example, if the cluster has 3 master nodes and 2 master nodes go down, the Kubernetes API server becomes unavailable and jobs fail on the cluster.

Rename Saved Search

Table of Contents

Advanced Clusters

Advanced Clusters

High availability

High availability