Table of Contents

Search

  1. Preface
  2. Introduction to Data Engineering Streaming
  3. Data Engineering Streaming Administration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Window Transformation
  8. Appendix A: Connections
  9. Appendix B: Monitoring REST API Reference
  10. Appendix C: Sample Files

High Availability Configuration

High Availability Configuration

To configure high availability for the streaming mapping, configure a state store directory for the source and guaranteed processing of the messages streamed by the source. Also configure the Spark execution parameters to enable the mapping to run without failing.
To configure high availability, perform the following configurations:
State store configuration
Configure a state store directory. Spark uses the state store directory to store the checkpoint information at regular intervals during the execution of the mapping. If a failure occurs, Spark restarts processing by reading from this state store directory.
Execution parameters
To ensure that the mapping runs without failing, configure the maximum number of tries to submit the mapping to Spark for processing. Configure the
spark.yarn.maxAppAttempts
and
yarn.resourcemanager.am.max-attempts
execution parameters when you configure the mapping properties. The values that you specify for both parameters must be equal and less than the values configured on the CDH or HortonWorks configuration.