Table of Contents

Search

  1. Preface
  2. Introduction to Big Data Streaming
  3. Big Data Streaming Configuration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Window Transformation
  8. Appendix A: Connections
  9. Appendix B: Data Type Reference
  10. Appendix C: Sample Files

Big Data Streaming User Guide

Big Data Streaming User Guide

High Availability Configuration

High Availability Configuration

To configure high availability for the streaming mapping, configure a state store directory for the source and guaranteed processing of the messages streamed by the source. Also configure the Spark execution parameters to enable the mapping to run without failing.
To configure high availability, perform the following configurations:
State store configuration
Configure a state store directory. Spark uses the state store directory to store the checkpoint information at regular intervals during the execution of the mapping. If a failure occurs, Spark restarts processing by reading from this state store directory.
Execution parameters
To ensure that the mapping runs without failing, configure the maximum number of tries to submit the mapping to Spark for processing. Configure the
spark.yarn.maxAppAttempts
and
yarn.resourcemanager.am.max-attempts
execution parameters when you configure the mapping properties. The values that you specify for both parameters must be equal and less than the values configured on the CDH or HortonWorks configuration.

0 COMMENTS

We’d like to hear from you!