Table of Contents

Search

  1. Preface
  2. Introduction to Data Engineering Administration
  3. Authentication
  4. Running Mappings on a Cluster with Kerberos Authentication
  5. Authorization
  6. Cluster Configuration
  7. Cloud Provisioning Configuration
  8. Data Integration Service Processing
  9. Appendix A: Connections Reference
  10. Appendix B: Monitoring REST API

Data Engineering Recovery

Data Engineering Recovery

The Data Integration Service manages jobs that are deployed to run in a cluster environment. When you enable the Data Integration Service for data engineering recovery, the Data Integration Service can recover and continue processing jobs that run on the Spark engine.
To use data engineering recovery, you must configure jobs to run on the Spark engine. Configure the Data Integration Service and log settings, and run the job from infacmd.
The Data Integration Service maintains a queue of jobs to run. The Data Integration Serviced assigns jobs from the queue to nodes which prepare them and send them to a compute cluster for processing.
The cluster assigns a YARN ID to each job and each of its child tasks to track jobs as it runs them. The Data Integration Service gets the YARN IDs from the cluster and stores them on the Model repository database.
If the Data Integration Service runs on a grid or multiple nodes, when a node fails, the Service Manager fails over to another node. The Data Integration Service queries the cluster for the status of tasks as identified by their YARN IDs and compares the response with the status of failed over tasks. Depending on the status, the Data Integration Service takes the following actions:
  • If a task has no YARN ID, it submits the task to the cluster.
  • If a task that has a YARN ID has not been sent to the cluster, it submits the task for processing.
  • If all tasks have been sent, it continues to monitor communications from the cluster until completion.
If the Data Integration Service runs on a single node, it attempts job recovery when the node is restored.
When the Data Integration Service restarts and runs a job, the job creates a cluster configuration under the
disTemp
directory. This process causes the
disTemp
directory to grow over time. Manage disk space by monitoring and periodically clearing the contents of the
disTemp
directory.
The Data Integration Service begins the recovery process by verifying that inactive nodes are not available, and then it assigns the recovered job to an available node. The verification process for unavailable nodes might take several minutes before the job is reassigned to an available node.

0 COMMENTS

We’d like to hear from you!