Monitor

Back Next

Monitors

Select engine properties to monitor on the

Monitors tab

. For each property, you can provide a statistic and threshold that, when reached, alerts you to a warning or error condition.

You can decide the frequency and interval of monitoring periods should be, as described in the following table.

Field	Explanation
Threshold Period	Period for collecting and aggregating statistics. The default is five minutes.
Evaluation Frequency	Number of times during the threshold interval the statistics are evaluated. For example, if the threshold interval is five minutes, and the evaluation frequency is five times, then statistics are collected and aggregated once a minute during every five-minute period. The default is 5 times.
Maximum Trouble Items	Number of error/warning items per engine to display on the Monitoring page of the Application Integration Console.
Monitor Alert Service	Described after this table.

Monitor Alert Service

Add the name of the service that will run when errors and warnings occur for monitored properties. This service also runs when it automatically triggered by a MultiSite site unavailable status. (MultiSite
require a special license.)

When errors occur, Process Server instantiates the alert service, which can then invoke an action, such as notifying an administrator that a monitored property has an error condition. The service can also monitor engine status (running or stopped).

To add a service, type in the Service name, and select

Save

The service name is the My Role partner link service, identified in the PDD file deployed with the BPEL process to be used as the alert service. You can find this name by looking on the

page.

After you add the service, select View Details to view the BPEL process.

Selecting Server Properties to Monitor

Use the following procedure to monitor a property, press the
Add Row
button.

Set the property by filling in this row, as follows:

Property to Monitor: Select an item from the picklist.

Level: Select a severity: Error, Warning, or Critical-Stop Server.

Statistic: Select a statistic. Sometimes there is only one choice for a property.

Op: Select a relational operator for the threshold.

Threshold: Add a non-negative integer to be used in the evaluation.

After adding properties, select Update.

Monitoring Properties

You can monitor the following properties:

Cluster communications issue detected (count): The number of times an issue with broadcasting cluster messages to nodes was detected.
Critical storage exceptions (count): Storage exceptions include:
Problems issuing select statements
Network communications failure
Database communications failure
Database user permission problems
Database connection acquisition time (ms): Tracks the amount of wait time to get a connection from the datasource. An excessively long wait may indicate signs of trouble with the size of the connection pool. The monitoring includes maximum and average values.; The Process Server storage layer does not perform connection pooling. It relies on the storage implementation (usually a
javax.sql.DataSource
) to pool connections. Consult your application server or database documentation to address any issues with poor performance related to the connection pool size or connection acquisition time.
Deadlock retry attempts (count): A deadlock retry can occur when the engine attempts to:
Lock a process
Write process state or journal entries
Remove alarms for dispatch
Match inbound receives
Acquire an internal counter value such as process Id or deployment Id
Discarded unmatched correlated receives (count): When a message with correlation properties fails to route to a running process instance and is not able to create a new process instance, the engine will keep trying to dispatch the message for the configured amount of time. There is a limit to the number of such unmatched messages that the engine will retry. This property tracks the number of messages that were discarded due to the buffer of unmatched messages being full.; The unmatched correlated receive timeout, which is shown on the Server Properties page, controls the amount of time that the engine will keep the unmatched message queuing until it is routed to a process instance. The pool of unmatched receives may fill up if this timeout is too high. However, such a problem may be an issue with process design as opposed to the timeout or buffer size.
Engine removed from cluster (count): The number of times an engine was removed from a cluster due to it going offline, missing cluster broadcast messages.
Failed to lock process (count): In the event of a failover, a process instance that began on one cluster node may be moved to a different node during recovery. It is unusual, but possible, that a process lock may fail, causing a process to suspend in the Suspended (programmatic) state. To address this problem, configure this property by set a Warning level for a count greater than 1 (or similar threshold).; By monitoring this property, as well as monitoring the Server Log for process recovery messages, you can determine if unwanted process suspensions are being caused by process lock failures.
Faulted/suspended(faulting) processes (count): Number of processes that end in a faulted state or are suspended due to an uncaught fault.; You may have an expectation that processes running on this engine should not fault, and you want to be notified if they do.
Plan cache efficiency (percent): Helps you determine if the Deployment cache setting is correct.; A deployment plan corresponds to each deployed version of a process, including associated disposition of running processes. Process versions that are active can be cached for better engine performance. The default number of plans that are cached is 100.
Plan cache removals (count): The number of times process plans were removed from the cache.
Plan cache turnover (percent): How often loading of new plans forced older plans out of cache.
Process cache efficiency (percent): The engine configuration contains a count of the maximum number of processes which can be kept in memory before they are forced into storage.; A process is cached in memory if it is currently executing some logic or if it is quiescent but is being cached in anticipation of receiving another message, alarm, or response to an invoke. This property reports the percentage of processes that are read from memory versus process instances read from storage. For example, 100% indicates that all process reads are coming from the memory cache.; On the Server Properties page, you can set values for Process Count and Process Idle Timeout. The Process Count setting controls the size of the cache. The Process Idle Timeout setting controls how long to keep an idle process in the cache. If the process cache efficiency percentage is low and your processes contain bi-directional invokes or can process multiple inbound messages, you may benefit from increasing the Process Count and Process Idle Timeout. This will help keep processes in memory. However, if your processes are long-running and receive messages only periodically, a low process cache efficiency percentage is not necessarily a problem.
Process count exceeded (count): The Server Properties include a Process Count option that specifies the maximum number of processes in memory. When the process count exceeds the value set for the Process Count, you can create an alert based on this property.; For example, if the Process Count is set to 50 and you want to be alerted when the count reaches 55, set this value to alert when the threshold is greater than or equal to 5.
Time to obtain plan (count): The amount of time it takes to load a process plan into memory.
Time to obtain process (ms): The time it takes to obtain a process is useful to determine if this operation is trending significantly higher under load situations. The monitoring includes maximum and average values. This property includes the time it takes to acquire a lock on a process as well as restore its state from storage if necessary.; This property works in conjunction with process cache efficiency.
Time to perform XSL transform (ms): The time spent performing transforms within the
doXslTransform()
custom function.
Time to query Identity provider (ms): Amount of time spent querying the identity provider in milliseconds. For example, if a request to the LDAP server to list groups for a user takes 20ms, that’s the value of this metric for that instance.; This property allows you to track down slowness issues with an LDAP provider if you track a spike in time to query
Time to save process (ms): Number of milliseconds required to save the process state and variables to the database.; A threshold can vary greatly depending on the process composition, number of variables, and size of variable data. This property only works for processes with a persistence setting of Full or Persist.
Time to validate messages (ms): Reports the amount of time the engine spends validating input and output messages from receives, invokes and other activities.; This validation is enabled on the Configuration page labeled: "Validate Input/Output messages against schema." If enabled, all messages are validated. Validation can also be enabled or disabled on individual partner links through a policy assertion.; This property does not track the time spent in explicit variable validation through the BPEL validate activity or optional validate attribute on the assign activity.; You may wish to speed up processing by disabling all or selected message validation. If too much time is spent validating messages, you can take several steps. Start by redeploying processes with an added Message Validation policy assertion for partner links, which provides fine-grained control over specific types of messages. You can also disable message validation for all processes by disabling the Server Property.
Work manager work start delay (ms): The time it takes between scheduling of a work item request and the actual start of work can help in tuning of the work manager pool.; If the time delay is trending upwards, there may not be enough threads available to handle the amount of work. The monitoring includes maximum and average values.; If you are using the default work manager, the size of the work pool can be configured on the Server Properties page. If you are using a work manager implementation provided by an application server, the size of the pool and the priorities of its threads should be configurable in your application server administration console.