Table of Contents

Search

  1. Preface
  2. Advanced clusters
  3. Setting up AWS
  4. Setting up Google Cloud
  5. Setting up Microsoft Azure
  6. Setting up a self-service cluster
  7. Setting up a local cluster
  8. Advanced configurations
  9. Troubleshooting
  10. Appendix A: Command reference

Advanced Clusters

Advanced Clusters

Initialization script failures

Initialization script failures

When an initialization script fails on a cluster node, it can have a significant impact on the
advanced cluster
. An init script failure can prevent the cluster from scaling up or cause the Secure Agent to terminate the cluster.
Note the impact that an init script failure can have in the following situations:
Failure during cluster creation
If the init script fails on any node during cluster creation, the Secure Agent terminates the cluster.
Resolve the issues with the init script before running a job to start the cluster again.
Failure during a scale up event
If the init script fails on a node that is added to the cluster during a scale up event, the node fails to start and the cluster fails to scale up. If the cluster attempts to scale up again and the node continues to fail to start, it adds to the number of accumulated node failures until the Secure Agent terminates the cluster.
Failure while recovering a master node
If you enable high availability in an AWS environment and the init script fails on a recovered master node, the node fails to start and contributes to the number of accumulated node failures over the cluster lifecycle.
Accumulated failures over the cluster lifecycle
During the cluster lifecycle, the Secure Agent tracks the number of accumulated node failures that occur due to an init script within a certain time frame. If the number of failures is too high, the agent terminates the cluster.
Find the log files for the nodes where the init script failed and use the log files to resolve the failures before running a job to start the cluster again.

0 COMMENTS

We’d like to hear from you!