When a Data Integration Service process becomes unavailable, the Service Manager restarts the Data Integration Service process on the same node or on a backup node.
The restart and failover behavior depends on the following ways that you can configure the Data Integration Service:
Single node
When the Data Integration Service runs on a single node and the service process shuts down unexpectedly, the Service Manager tries to restart the service process. If the Service Manager cannot restart the process, the process stops or fails.
Primary and backup nodes
When the Data Integration Service runs on primary and backup nodes and the service process shuts down unexpectedly, the Service Manager tries to restart the service process. If the Service Manager cannot restart the process, the Service Manager fails the service process over to a backup node.
A Data Integration Service process fails over to a backup node in the following situations:
The Data Integration Service process fails and the primary node is not available.
The Data Integration Service process is running on a node that fails.
Grid
When the Data Integration Service runs on a grid, the restart and failover behavior depends on whether the master or worker service process becomes unavailable.
If the master service process shuts down unexpectedly, the Service Manager tries to restart the process. If the Service Manager cannot restart the process, the Service Manager elects another node to run the master service process. The remaining worker service processes register themselves with the new master. The master service process then reconfigures the grid to run on one less node.
If a worker service process shuts down unexpectedly, the Service Manager tries to restart the process. If the Service Manager cannot restart the process, the master service process reconfigures the grid to run on one less node.
The Service Manager restarts the Data Integration Service process based on domain property values set for the amount of time spent trying to restart the service and the maximum number of attempts to try within the restart period.
The Data Integration Service clients are resilient to temporary connection failures during restart and failover of the service.