can become highly available to eliminate a single point of failure when the master node goes down. If you enable high availability and one master node goes down, other master nodes will be available and jobs on the cluster can continue running.
When a cluster is highly available, watch out for job failures in the following scenarios:
If all master nodes go down, jobs will fail.
If too many master nodes go down, the Kubernetes API server becomes unavailable. The threshold for the number of failures is
(n+1)/2
where
n
is the number of master nodes. For example, if the cluster has 3 master nodes and 2 master nodes go down, the Kubernetes API server becomes unavailable and jobs fail on the cluster.