Table of Contents


  1. Preface
  2. Introduction to Data Engineering Streaming
  3. Data Engineering Streaming Administration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Transformation in Streaming Mappings
  8. Window Transformation
  9. Appendix A: Connections
  10. Appendix B: Monitoring REST API Reference
  11. Appendix C: Sample Files

High Availability Configuration

High Availability Configuration

To configure high availability for the streaming mapping, configure a state store directory for the source and guaranteed processing of the messages streamed by the source. Also configure the Spark execution parameters to enable the mapping to run without failing.
To configure high availability, perform the following configurations:
State store configuration
Configure a state store directory. Spark uses the state store directory to store the checkpoint information at regular intervals during the execution of the mapping. If a failure occurs, Spark restarts processing by reading from this state store directory.
Execution parameters
To ensure that the mapping runs without failing, configure the maximum number of tries to submit the mapping to Spark for processing. Configure the
execution parameters when you configure the mapping properties. The values that you specify for both parameters must be equal and less than the values configured on the CDH or HortonWorks configuration.


We’d like to hear from you!