The mon command
About
The mon command is a very power command that comes with merlin.
It is this command that is used to setup a distributed or a load balanced environment.
This command can also be used to control the other op5 monitor servers.
| The mon command is very powerful. Handle with care! It has the power to both create and destroy your whole op5 installation. |
The commands
To use the mon command just type
# mon
The command should be used with one category and one sub-category. Only start, stop and restart categories can be used without any sub-category.
Start
# mon start
This will start the op5 monitor process on the node that you run the command from.
Stop
# mon stop
This will stop the op5 monitor process on the node you run the command from.
Restart
#mon restart
This will restart the op5 monitor process on the node you run the command from.
Ascii
Ninja
# mon ascii ninja
This will display the ninja logo in ascii art.
Merlin
# mon ascii merlin
This will display the merlin logo in ascii art.
Check
Spool
# mon check spool [--maxage=<seconds>] [--warning=X] [--critical=X] <path> [--delete]
Checks a certain spool directory for files (and files only) that are older than 'maxage'. It's intended to prevent buildup of checkresult files and unprocessed performance-data files in the various spool directories used by op5 Monitor.
--delete | Causes too old files to be removed. |
--maxage | Is given in seconds and defaults to 300 (5 minutes). |
<path> | May be 'perfdata' or 'checks', in which case directory names will be taken from op5 defaults |
--warning and --critical | Have no effect if '--delete' is given and will otherwise specify threshold values. |
| Only one directory at a time may be checked. |
Cores
# mon check cores --warning=X --critical=X [--dir=]
Checks for memory dumps resulting from segmentation violation from core parts of op5 Monitor. Detected core-files are moved to /tmp/mon-cores in order to keep working directories clean.
--warning | Default is 0 |
--critical | Default is 1 (any corefile results in a critical alert) |
--dir | Lets you specify more paths to search for corefiles. This option can be given multiple times. |
--delete | Deletes corefiles not coming from 'merlind' or 'monitor'. |
Distribution
#mon check distribution [--no-perfdata]
Checks to make sure distribution works ok.
| Note that it's not expected to work properly the first couple of minutes after a new machine has been brought online or taken offline |
Exectime
# mon check exectime [host|service] --warning=<min,max,avg> --critical=<min,max,avg>
Checks execution time of active checks.
[host|service] | Select host or service execution time. |
--warning | Set the warning threshold for min,max and average execution time, in seconds |
--critical | Set the critical threshold for min,max and average execution time, in seconds |
Latency
# mon check latency [host|service] --warning=<min,max,avg> --critical=<min,max,avg>
Checks latency time of active checks.
[host|service] | Select host or service latency time. |
--warning | Set the warning threshold for min,max and average execution time, in seconds |
--critical | Set the critical threshold for min,max and average execution time, in seconds |
Orphans
#mon check orphans
Checks for checks that haven't been run in too long a time.
db
cahash
Calculates a hash of all entries in the contact_access table. This is really only useful for debugging purposes. The check does not block execution of other scripts or checks.
Fixindexes
Fixes indexes on merlin tables containing historical data.
| Don't run this tool unless you're asked to by op5 support staff or told to do so by a message during an rpm or yum upgrade. |
ecmd
Search
# mon ecmd search <regex>
Prints 'templates' for all available commands matching <regex>.
The search is case insensitive.
Submit
# mon ecmd submit [options] command <parameters>
Submits a command to the monitoring engine using the supplied values.
Available options:
--pipe-path=</path/to/nagios.cmd>
Example:
An example command to add a new service comment for the service PING on the host foo would look something like this:
# mon ecmd submit add_svc_comment service='foo;PING' persistent=1 author='John Doe' comment='the comment'
Note how services are written. You can also use positional arguments, in which case the arguments have to be in the correct order for the command's syntactic template. The above example would then look thus:
# mon ecmd submit add_svc_comment 'foo;PING' 1 'John Doe' 'the comment'
log
Fetch
# mon log fetch [--incremental=<timestamp>]
Fetches logfiles from remote nodes and stashes them in a local path, making them available for the 'sortmerge' command.
Import
# mon log import [--fetch]
This commands run the external log import helper.
If --fetch is specified, logs are first fetched from remote systems and sorted using the merge sort algorithm provided by the sortmerge command.
Purge
#mon log purge
Remove log files that are no longer in use.
| Currently only deletes stale RRD files. |
Push
#mon log push
(documentation missing)
Show
#mon log show
Runs the showlog helper program. Arguments passed to this command will get sent to the showlog helper.
For further help about the show category use:
#mon log show --help
Sortmerge
#mon log sortmerge [--since=<timestamp>]
Runs a mergesort algorithm on logfiles from multiple systems to create a single unified logfile suitable for importing into the reports database.
Node
Add
# mon node add <name> type=[peer|poller|master] [var1=value] [varN=value]
Adds a node with the designated type and variables.
Ctrl
#mon node ctrl <name1> <name2> [--self] [--all|--type=<peer|poller|master>] -- <command>
Execute <command> on the remote node(s) named. --all means run it on all configured nodes, as does making the first argument '--'.
--type=<types> means to run the command on all configured nodes of the given type(s).
The first not understood argument marks the start of the command, but always using double dashes is recommended. Use single-quotes to execute commands with shell variables, output redirection or scriptlets, like so:
# mon node ctrl -- '(for x in 1 2 3; do echo $x; done) > /tmp/foo'
# mon node ctrl -- cat /tmp/foo
List
#mon node list [--type=poller,peer,master]
Lists all nodes of the (optionally) specified type
Remove
#mon node remove <name1> [name2] [nameN]
Removes one or more nodes from the merlin configuration.
Show
#mon node show [--type=poller,peer,master]
Display all variables for all nodes, or for one node in a fashion suitable for being used as eval $(mon node show nodename) from shell scripts and scriptlets.
Status
#mon node status
Show status of all nodes configured in the running Merlin daemon.
Red text points to problem areas, such as high latency or the node being inactive, not handling any checks, or not sending regular enough program_status updates.
Tree
# mon node tree
This command draws a tree of the masters and pollsers. Example:
+-----+ +--------+
| ipc |----| athena |
+-----+ +--------+
|
|
| HOSTGROUP: op5-gbg,localnet +-------+
= ------------------------------| dione |
\ +-------+
\ +------+
--| styx |
+------+
A setup of 2 peered masters and 2 peered pollers. The pollers are monitoring two hostgroups (op5-gbg and localnet).
oconf
Changed
#mon oconf changed
Print last modification time of all object configuration files
Fetch
#mon oconf fetch
Fetch the configuration from a Master, this is executed on a poller system. Useful when the poller can talk to the master but not vice verca.
Files
#mon oconf files
Print the configuration files in alphabetical order
Hash
#mon oconf hash
Print sha1 hash of running configuration
HGlist
#mon oconf hglist
Print a sorted list of all configured hostgroups
Nodesplit
#mon oconf nodesplit
Same as 'split', but use merlin's config to split config into configuration files suitable for poller consumption
Pull
#mon oconf pull
(documentation missing)
Push
#mon oconf push
Splits configuration based on merlin's peer and poller configuration and send object configuration to all peers and pollers, restarting those that receive a configuration update. ssh keys need to be set up for this to be usable without admin supervision.
This command uses 'nodesplit' as its backend.
Spit
#mon oconf split <outfile:hostgroup1,hostgroup2,hostgroupN>
Write config for hostgroup1,hostgroup2 and hostgroupN into outfile.
SSHKey
Fetch
#mon sshkey fetch
Fetches all the SSH keys from peers and pollers.
Push
#mon sshkey push
Pushes the local SSH keys to all peers and pollers.
Sysconf
Ramdisk
#mon sysconf ramdisk
To enable ramdisk:
#mon sysconf ramdisk enable
A ramdisk can be enabled for storing spools for performance data and checkresults.
By storing these spools on a ramdisk we can lower the disk I/O significant
Rrdmultiple
#mon sysconf rrdmultiple
This will convert the rrd-files for the graphs to multiple format instead of the single format used in op5 Monitor 6.2.x and earlier. This is not need for new installtions from version 6.3.
Check
Cores
#mon check cores
Checks for memory dumps resulting from segmentation violation from core parts of op5 Monitor. Detected core-files are moved to /tmp/mon-cores in order to keep working directories clean. This command can have the following options:
--warning default is 0
--critical default is 1 (any corefile results in a critical alert)
--dir lets you specify more paths to search for corefiles. This
option can be given multiple times.
--delete deletes corefiles not coming from 'merlind' or 'monitor'
Distribution
#mon check distribution
Checks to make sure work distribution works ok. Note that it's not expected to work properly the first couple of minutes after a new machine has been brought online or taken offline. This command can be run with --no-perfdata, this will exclude performance data from the output.
Exectime
#mon check exectime [host|service]
Checks execution time of active checks. This command can have the following options:
--warning=<min,max,avg> --critical=<min,max,avg>
Latency
# mon check latency [host|service]
Checks latency of active checks. This command can have the following options:
--warning=<min,max,avg> --critical=<min,max,avg>
Orphans
#mon check orphans
Checks for checks that haven't been run in too long a time
Spool
#mon check spool <path>
Checks a certain spool directory for files (and files only) that are older than 'maxage'. It's intended to prevent buildup of checkresult files and unprocessed performance-data files in the various spool directories used by op5 Monitor.
--delete causes too old files to be removed.
--maxage is given in seconds and defaults to 300 (5 minutes).
<path> may be 'perfdata' or 'checks', in which case directory names will be taken from op5 defaults
--warning and --critical have no effect if '--delete' is given and will otherwise specify threshold values.
Only one directory at a time may be checked.
Status
#mon check status
Check that all nodes are connected and run checks (analogous to mon node check)
QH
Get
#mon qh get --socket=</path/to/query-socket> <query>
Run an arbitrary query with the nagios query handler and pretty-print the output. Queries need not include the trailing nulbyte or leading hash- or at-sign.
query
#mon node qh query --socket=</path/to/query-socket> <query>
Run an arbitrary query with the nagios query handler and print its raw output. Queries need not include the trailing nulbyte ori leading hash- or at-sign.