If a Data Integration Service process or node fails unexpectedly, the Data Integration Service can recover jobs running on the Spark engine. The Data Integration Service sends the job to another node where job resumes from the point at which the previous node failed. Recovery occurs upon node startup.
If the Data Integration Service runs on a single node, it attempts job recovery when the node is restored. If the Data Integration Service runs on a grid or multiple nodes, when a node fails, the Service Manager fails over to another node.
To use data engineering recovery, you must configure jobs to run on the Spark engine. Configure data engineering recovery in the Data Integration Service properties, and submit the job from the infacmd client.