Table of Contents

Search

  1. Preface
  2. Advanced clusters
  3. Setting up AWS
  4. Setting up Google Cloud
  5. Setting up Microsoft Azure
  6. Setting up a self-service cluster
  7. Setting up a local cluster
  8. Advanced configurations
  9. Troubleshooting
  10. Appendix A: Command reference

Advanced Clusters

Advanced Clusters

High availability

High availability

An
advanced cluster
can become highly available to eliminate a single point of failure when the master node goes down. If you enable high availability and one master node goes down, other master nodes will be available and jobs on the cluster can continue running.
When a cluster is highly available, watch out for job failures in the following scenarios:
  • If all master nodes go down, jobs will fail.
  • If too many master nodes go down, the Kubernetes API server becomes unavailable. The threshold for the number of failures is
    (n+1)/2
    where
    n
    is the number of master nodes. For example, if the cluster has 3 master nodes and 2 master nodes go down, the Kubernetes API server becomes unavailable and jobs fail on the cluster.

0 COMMENTS

We’d like to hear from you!