Main Page | Features | Central Services | csv-Files | Types | Transfer | Access | API-C | API-.NET | API-Java | Examples | Downloads
page generated on 21.11.2024 - 04:45
TINE Watchdog Server

The TINE watchdog server is able to monitor any process on the local host and, according to the configuration parameters, ensure the process is (re)started if necessary. Likewise, per command it can start or stop any configured process and set appropriate alarms when warranted. It also maintaines local histories of CPU load, memory usage, and network usage. If permitted, it can also reboot the host on which it is running.

The watchdog is currently available on both Windows and Linux platforms. It is a console application which can be run in the background and/or started as a service on a windows host, although this is not adviseable if any of the applications it needs to monitor should be visible on the windows host (e.g. GUI applications).

Watchdog Configuration

The watchdog requires a configuration file in order to start. By default it will look for a file called watchdog.cfg. Else the name of a specific configuration file can be given at the command line.

Simply starting the watchdog at the console will produced the following output:

Usage: watchdog <configuration_file>
If the <configuration_file> is not provided, [watchdog.cfg] is used.
If the default configuration is also not available, this instructions are printed and example.cfg generated.
The following properties can be configured:
Global Properties:
START_SERVER
<Yes|No> indicates if the server should be started or
not (default: Yes).
WATCH_CYCLE_PERIOD
Time in milliseconds between two consecutive checks of
all processes (default: 1000 ms).
PRINT_INFO_ON_EACH_CYCLE
<Yes|No> indicates if a process status should be printed
to console in each cycle (default: No).
SUSPENDED_TIMEOUT
Time in seconds how long the watchdog remains suspended
before it is automatically resumed (default: 600 s).
PID_REFRESH_RATE
Time in seconds how often the PID of the processes are
refreshed. This assures that if a process is killed and
PIDs are recycled there is not a different process hiding
under the PID associated with a watched process (default: 60 s).
ALLOW_MONITORING_OTHER_HOSTS
<Yes|No> indicates if watchdog is allowed to monitor
properties from other hosts (default: no).
CONSOLE_COMMAND
The command which is used to start a process in a
separated console on linux (default: none).
STOP_ALL_ON_SHUTDOWN
<Yes|No> indicates if all watched processes are sent
a stop signal if watchdog terminates gracefully (default: no).
SHORT_TERM_HISTORY
Short term history in seconds (default: 300).
LONG_TERM_HISTORY
Long term history in months (default: 1).
GRACE_PERIOD_BEFORE_KILL
Grace period in seconds between term and kill signals
sent to a bad process (default: 10 s).
ENVIRONMENT_VARIABLES
List of environment variables as name=value pairs
separated by semi colon.
NETWORK_ENABLED
<Yes|No> indicates if a the network utilization is
queried or not.
Server and FEC Properties (standard TINE restrictions apply):
SERVER
The name of the server (default: WD_<COMPUTER_NAME>).
FEC
The FEC name identifying the Front End Computer and to
which all registered equipment modules are bound
(default: <COMPUTER_NAME>.<PORT_OFFSET>).
SUBSYSTEM
The subsystem to which attached device servers belong
(default: WDOG).
CONTEXT
32-character string giving the context of all registered
equipment modules (default: SERVICE).
DESCRIPTION
64-character description of the FECs server duties
(default: n/a).
LOCATION
32-character string giving the physical location of
the FEC (default: n/a).
HARDWARE
32-character brief description of the IO hardware found
on the FEC (default: None).
RESPONSIBLE
32-character string listing the developer(s) responsible
for the FEC (default: Admin).
PORT_OFFSET
The Port Offset to be applied to the FEC (default: 20).
Individual Process Properties (arrow marks mandatory settings):
-> [<process ID>]
The process id is always given in rectangular brackets
and defines the start of a process configuration. This is
the exported device name for the process, when server is started.
-> PATH_TO_EXECUTABLE
Path to the process executable (use quotes if it contains spaces); mandatory if the process is not a service
-> NAME
Name of the process (the same as it appears in the OS process list).
If not provided the last segment of PATH_TO_EXECUTABLE is used. Mandatory
if service.
ALIAS
An alias for the exported device names (default: same as ID).
SERVICE
<Yes|No> indicates if this process is a service (Windows only).
SHOW_CONSOLE_WINDOW
<Yes|No> indicates if the console of this process should be
visible or not (Windows only, default: Yes).
DESCRIPTION
The description of the process
WORKING_DIRECTORY
Working directory where to start the process (use quotes
if it contains spaces) (default: current directory).
COMMAND_LINE_PARAMETERS
Process startup parameters, separated by spaces.
Put quotes around individual parameters if they contain
spaces (default: none).
MATCH_PARAMETERS
<Yes|No> indicates if parameters listed for this
process should match the parameters in the process' command
line. Use this setting when several processes with the same
name are started using different command line parameters (default: Yes).
MATCH_PATH
<Yes|No> indicates if path of this process should
match the path to executable listed for this process. Use
this setting when several processes with the same name are
started from different directories (default: Yes).
PROCESS_CAPTION
The caption of the process if the process is a
console application (default: cmd line).
ENVIRONMENT_VARIABLES
Environment variables to use in addition to the default
ones. Individual variables should be given in name=value
pairs, and should be separated by semicolons. If you want to
delete a variable that is set by the parent, set its
value to an empty string: VAR=; (default: none).
PRECEDING_PROCESSES
Processes to start before this process is started,
separated by semicolons. The process has to be the ID
or alias of one of the other processes managed by watchdog,
or a system process (default: none).
MONITOR_N_SECONDS_AFTER_START
Number of seconds to wait after the process was started
and before the watchdog starts to watch it. This allows the
process to take time for its startup procudere (default: 10 s).
PAUSE_N_SECONDS_AFTER_PREVIOUS_PROCESS
Number of seconds to wait after the previous process was
and before this process is started (default: 1 s).
NUMBER_OF_RETRIES
Number of times the process is started before it is
disabled (disabled processes are no longer managed by
watchdog until started manually) (default: 3).
RETRY_INTERVAL
Number of seconds to wait before trying to start the
process after it was discovered it is not running (default: 20 s).
AUTO_START
<Yes|No|Once> indicates if a non-running process should be
started automatically by watchdog. If the value is once
the process can only be started with a command to the
watchdog (e.g. script execution). The process will not be
monitored in that case (default: Yes).
MAX_CPU_LOAD
Maximum allowed CPU load in percentage (default: 100 % ).
NUMBER_OF_ALLOWED_CPU_LOAD_VIOLATIONS
How many times the max CPU load can be violated, before
the process is restarted (default: 1).
MAX_MEMORY
Maximum memory consumption (commit size) in MB (default: 1 TB).
NUMBER_OF_ALLOWED_MEMORY_VIOLATIONS
How many times the max memory limit can be violated
before the process is restarted (default: 1).
T_PROPERTY
TINE property to monitor. If this process is a TINE
server, define the property which will be monitored.
If no response is received from the property the process
is restarted (default: none).
T_DEVICE
TINE device that owns the monitored property (default: none).
T_POLLING_RATE
The rate in milliseconds at which the TINE property
is polled (default: 1000 ms).
T_TIMEOUT
Timeout in seconds at which the process is considered
not responding if no update has been received from the
property (default: 10 s).
T_RESTART_TIMEOUT
Timeout in seconds at which the process will be restarted
if TINE property is not responding. Enter 0 if the
process should not be restarted (default: 0 s).
*************************************************************************
* *
* Example configuration file has been created for you: example.cfg. *
* *
*************************************************************************

The global and server properties can usually be omitted, in which case the default settings will be applied.
As the watchdog server, when running as a server, is a standard tine server, it will need a FEC name, a Context, a Server name, and a Port offset. By default, these will be

  • CONTEXT: SERVICE
  • SERVER_NAME: WD_<COMPUTER_NAME>
  • PORT_OFFSET : 20
  • FEC_NAME : <COMPUTER_NAME>.<PORT_OFFSET>

On rare occasions there might be a conflict with the server port, in which case the PORT_OFFSET should be adjusted.

The individual monitored processes should each have their own section. And (this is important!) the section header name should be the same as the FEC_NAME of the process (if it is a TINE server). This is needed to ensure that the systematics used by the FEC Remote Panel (see Remote Debugging Tools) will be able to find and (re)start or otherwise control the selected server via the watchdog.

As an example, consider the section (show below) relevant to the control of the tine ENS :

[ENS]
NAME: ens
PATH_TO_EXECUTABLE: /export/tine/server/ens/bin/ens
WORKING_DIRECTORY: /export/tine/server/ens/bin
ENVIRONMENT_VARIABLES:
MATCH_PATH: Yes
MATCH_PARAMETERS: Yes
MAX_CPU_LOAD: 90
NUMBER_OF_ALLOWED_CPU_LOAD_VIOLATIONS: 5
MAX_MEMORY: 1000
NUMBER_OF_ALLOWED_MEMORY_VIOLATIONS: 3
AUTO_START: Yes
MONITOR_N_SECONDS_AFTER_START: 5
NUMBER_OF_RETRIES: 5
RETRY_INTERVAL: 10
T_PROPERTY: NSERVERS
T_DEVICE: /SITE/ENS1/#0
T_POLLING_RATE: 1000
T_RESTART_TIMEOUT: 20

The three most important (and required) parameters used to start (or attach to) a specific process are

  • NAME: gives the name of the process (on windows, this is liable to have the extension .EXE) which should be monitored.
    In the above it is simply the name ens of the binary executable.
  • PATH_TO_EXECUTABLE: gives the full path to the executable (or script) which should be called by the watchdog to start the process. In the above this is /export/tine/server/ens/bin/ens and includes the name of the executable itself. It could just as well point to a script which starts the executable 'ens' (which is the process to be monitored). i.e. if the process 'ens' is not running and the watchdog should start it, then the PATH_TO_EXECUTABLE is called.
  • WORKING_DIRECTORY: gives the working directory the process should have when it is started.
    In the above this is /export/tine/server/ens/bin.

    In addition (not required)

  • AUTO_START: (default = TRUE) should be specified to instruct the watchdog to start the process if it is not running. If this parameter is set to FALSE then the watchdog will be able to control the process, but will need an explicit command to start or stop it.

Note that if a monitored process is already running when the watchdog is started, if will attach itself to the running process if (big if!) the PATH_TO_EXECUTABLE, COMMAND_LINE_PARAMETERS are matched (or not) according to the MATCH_PATH and MATCH_PARAMETERS settings!

Watchdog and Process Control

Once the watchdog is running, one can control the monitored processes via the watchdog client :

Many of the settings (those with a non-grey input field) can be changed on the fly though the above panel.

In addtion a right click over CPU or Memory fields in the process list will offer a Show History context menu option, which if selected will popup a history viewer, when the trends of the relevant processes can be viewer. For instance, selectting Show History for the Memory of a monitored Central Alarm Server might yield:

Likewse, the trends of the network activity can be shown by making use of this context menu from the Network Activity field in the topmost grid.

As mentioned above, the FEC Remote Panel can signal control or restart instructions to a server's watchdog if the configuration parameters are systematically correct. This is illustrated in the figure below:


Command Line Control

In addition to the above tool, one can also control monitored processes via the command line tool wctrl, which is more suitable for use in scripts. Simply typing wctrl -help at the command line yields:

wctrl -help
Use this program in the following way:
wctrl.exe [options] [process] [property]
The following options are available:
/c=CONTEXT The name of the context in which to look for the
watchdog server(default: SERVICE)
/s=SERVER The name of the watchdog server (default: WD_<host> = WD_acclxciens1)
/p=PROPERTY The name of the property to read/write (the same
property of all devices / processes will be read)
If PROPERTY is a process property it will be read for all
processes.If it is a server property it will be read
for the server only. That same is true for the commands.
lists List all devices of the watchdog server.
list List all devices of the watchdog server with some basic readings.
[process] The name of the process, which the property will
be read for, or command executed on
[property] the name the property to read or command to execute
If [process] is given and [property] is not, all properties
(no commands) will be read for that process.
If /p=PROPERTY is specified the [process] and [property]
parameters are irrelevant and will be ignored.
Examples:
wctrl /c=TEST /p=cpu Reads the CPU property of all processes
on the server /TEST/WD_acclxciens1
wctrl /s=WD_Foobar /p=memory Reads the Memory property of all
processes on the server /SERVICE/WD_Foobar
wctrl sinegen cpu Reads the CPU property of the sinegen process
on the server /SERVICE/WD_acclxciens1
wctrl sinegen Reads all properties of the sinegen process on the
server /SERVICE/WD_acclxciens1
wctrl /c=TEST /s=WDOG sinegen start Executes the start command of
the sinegen process on /TEST/WDOG
wctrl /c=TEST /s=WDOG /p=cpu sinegen memory Reads the CPU property
of all processes on /TEST/WDOG. Sinegen and memory
parameters are ignored
wctrl /c=TEST /s=WDOG list List all devices of the server /TEST/WDOG

For example, a quick check on the status of the monitored processes can be achieved by typing wctrl -list :

SERVICE/WD_acclxciens1
#: PROCESS | MON. | RUN. | CPU | MEMORY | #SP | #ST | LAST_RETRY
01: UDPPING | YES | YES | 0 % | 2.49 MB | 0 | 0 | Mar 3 16:25:28
02: ENS | YES | YES | 0 % | 124.49 MB | 0 | 0 | Jul 30 15:16:59
03: GLOBALSFEC | YES | YES | 0 % | 20.25 MB | 0 | 0 | Mar 3 16:25:28
04: TIMSERV | YES | YES | 0 % | 16.57 MB | 0 | 0 | Mar 3 16:25:28
05: GENS | YES | YES | 0 % | 166.85 MB | 0 | 2 | Sep 15 09:28:05
06: EVENTS | YES | YES | 0 % | 17.15 MB | 0 | 0 | Mar 3 16:25:28
07: CLOGFEC | YES | YES | 0 % | 18.27 MB | 0 | 0 | Mar 3 16:25:28
08: LOCCLN | YES | YES | 0 % | 5.20 MB | 0 | 0 | Mar 3 16:25:28
09: SITECASFEC | YES | YES | 0 % | 476.66 MB | 0 | 3 | Aug 31 14:57:15
10: GLITCHFEC | YES | YES | 0 % | 251.52 MB | 0 | 0 | Mar 3 16:25:28
11: VBRRPTFEC | YES | YES | 0 % | 19.70 MB | 0 | 1 | Mar 3 16:25:31

Watchdog Setup Wizard

One can also configure a watchdog server from scratch via the Watchdog Wizard, which will guide the user through the relevant configuration parameters.

As you can see, on can either start from scratch (empty file), or from an existing watchdog configuration file, or by editing the contents of a running watchdog's configuration.

If we choose empty file and select Next :

If we continue and accept the default suggestions, we arrive at the stage where we must select the processes we wish the watchdog to monitor :

Selecting the Add button will scan the local FEC manifest for possible entries and present them in a dialog, where we can select those servers we wish to monitor and configure any associated monitoring parameters.


Impressum   |   Imprint   |   Datenschutzerklaerung   |   Data Privacy Policy   |   Declaration of Accessibility   |   Erklaerung zur Barrierefreiheit
Generated for TINE API by  doxygen 1.5.8