The op5 Monitor back end can easily be configured to be used as a distributed monitoring solution. The distributed model looks like this.There are a few things you need to take care of before you can start setting up a distributed monitoring solution. You need to make sure
![]()
you have at least two op5 Monitor servers of the same architecture up and running.
-
15551, op5 Monitor back end communication port
-
22, ssh (for the configuration sync).
![]()
Make sure the host group, the one the poller will be responsible for, is added to the master configuration and that at least one host is added to that host group.The mon command is used to make life a bit easier when it comes to setting up a load balanced solution. To get more detailed information about the command mon just execute like this:
2 Add the new poller to the configuration with the following command:
mon node add poller01 type=poller hostgroup=gbg
3 Create and add ssh keys to and from the second peer by
as root user:
mon sshkey push --all
mon sshkey fetch --all
4
5 Set up the configuration sync:
dir=/opt/monitor/etc/oconf
conf=/opt/monitor/etc/nagios.cfg
mon node ctrl -- sed -i /^cfg_file=/d $conf
mon node ctrl -- sed -i /^log_file=/acfg_dir=$dir $conf
mon node ctrl -- mkdir -m 775 $dir
mon node ctrl -- chown monitor:apache $dir
6 To make sure you have an empty configuration on poller01:
mon node ctrl -- mon oconf hash
This will give you an hash looking like this (“da 39” -hash):
da39a3ee5e6b4b0d3255bfef95601890afd80709
7
8 In this instruction we will add a new poller to our distributed solution. Here we have the following hosts:
![]()
poller02 (This is the new one.)
2 Add the new poller to the configuration with the following command:
mon node add poller02 type=poller hostgroup=gbg
3
4
5 Set up the configuration sync:
dir=/opt/monitor/etc/oconf
conf=/opt/monitor/etc/nagios.cfg
mon node ctrl poller02 -- sed -i /^cfg_file=/d $conf
mon node ctrl poller02 -- sed -i /^log_file=/acfg_dir=$dir $conf
mon node ctrl poller02 -- mkdir -m 775 $dir
mon node ctrl poller02 -- chown monitor:apache $dir
6 To make sure you have an empty configuration on poller01:
mon node ctrl poller02 -- mon oconf hash
This will give you an hash looking like this (“da 39” -hash):
da39a3ee5e6b4b0d3255bfef95601890afd80709
7
8 You might want to add an other host group for to a poller. You need to edit the merlin.conf file to do that. This is not doable with any comand as it is today.
1 Open up and edit /opt/monitor/op5/merlin/merlin.conf.
2 Add a new host group in the hostgroup line like this:
hostgroup = gbg,sth,citrix_servers
Remember to not put any space between the hostgroup name and comma.
3
4 The poller will be removed from the master configuration and all distributed configuration on the poller will also be removed.
2 Deactivate and remove all distributed setup on the poller host.
mon node ctrl poller01 -- mon node remove master01
3
4
5 If a poller goes down the default configuration is for the master to take over all the checks from the poller. For this to work all hosts monitored from the poller most also be monitorable from the master.If the master server not should take over the checks from the poller this can be set in the merlin configuration file.To synchronize files from the master server to the poller add a sync paragraph in the file /opt/monitor/op5/merlin/merlin.conf/opt/monitor/etc/htpasswd.users /opt/monitor/etc/htpasswd.usersIf one peer is behind some kind of firewall or is on a NAT adress it might not be possilbe for the master server to connect to the peer.To tell the master not to connect to the poller and let the poller open the session we need to add a option to the file /opt/monitor/op5/merlin/merlin.conf.In the example below we have a master “master01” that can not connect to “poller01” but “poller01” is allowed to connect to “master01”.Is is also possible to set this option on the peer instead then the master will always initiate the session.After a poller as been unavailable for a master (i.e of network outage) the report data will be synced from the poller to the master.For more information and a more complex example please take a look at the howto in the git repository of the opensource project of Merlin:The op5 Monitor back end can easily be used as a load balanced monitoring solution. The load balanced model looks like this.There are a few things you need to take care of before you can start setting up an load balanced monitoring. You need to make sure
![]()
you have at least two op5 Monitor servers of the same architecture up and running.
![]()
15551, op5 Monitor back end communication port
![]()
22, ssh (for the configuration sync).The mon command is used to make life a bit easier when it comes to setting up a load balanced solution. To get more detailed information about the command mon just execute like this:
2
3 Create and add ssh keys to and from the second peer by
as root user:
mon sshkey push --all
mon sshkey fetch --all
4
5
6
![]()
peer03 (This is the new one.)
2
3
4 Add the peers to one and each other
mon node ctrl peer02 -- mon node add peer03 type=peer
mon node ctrl peer03 -- mon node add peer02 type=peer
mon node ctrl peer03 -- mon node add peer01 type=peer
6 Restart monitor on peer01 and send the configuration to all peers again.
mon restart ; sleep 3 ; mon oconf push
2 Remove all peer configuration from peer02
mon node ctrl peer02 -- mon node remove peer01
mon node ctrl peer02 -- mon node remove peer03
3
4 Remove peer02 from the rest of the peers, in this case peer03
mon node ctrl --type=peer -- mon node remove peer02
5 Restart the rest of the peers, in this case only peer03
mon node ctrl --type=peer -- mon restart
6
7 To synchronize files between servers add a sync paragraph in the file /opt/monitor/op5/merlin/merlin.conf/opt/monitor/etc/htpasswd.users /opt/monitor/etc/htpasswd.usersFor more information and a more complex example please take a look at the howto in the git repository of the opensource project of Merlin:Merlin, or Module for Effortless Redundancy and Loadbalancing In Nagios, allows the op5 Monitor processes to exchange information directly as an alternative to the standard nagios way using NSCA.Merlin functiona as backend for Ninja by adding support for storing the status information in a database, fault tolearance and load balancing. This means that Merlin now are responsible for providing status data and acts as a backend, for the Ninja GUI.merlin-mod is responsible for jacking into the NEBCALLBAC_* calls and send them to a socket.
If the socket is not available the events are written to a backlog and sent when the socket is available again.The Merlin deamon listens to the socket that merlin-mod writes to and sends all events received either to a database of your choise (using libdbi) or to another merlin daemon.
If the daemon is unsuccessful in this it writes to a backlog and sends the data later.This is a database that includes Nagios object status and status changes. It also contains comments, scheduled downtime etc.The mon command is a very power command that comes with merlin.
It is this command that is used to setup a distributed or a load balanced environment.
This command can also be used to control the other op5 monitor servers.
The command should be used with one category and one sub-category. Only start, stop and restart categories can be used without any sub-category.# mon check spool [--maxage=<seconds>] [--warning=X] [--critical=X] <path> [--delete]Checks a certain spool directory for files (and files only) that are older than 'maxage'. It's intended to prevent buildup of checkresult files and unprocessed performance-data files in the various spool directories used by op5 Monitor.
May be 'perfdata' or 'checks', in which case directory names will be taken from op5 defaults Have no effect if '--delete' is given and will otherwise specify threshold values.
Checks for memory dumps resulting from segmentation violation from core parts of op5 Monitor. Detected core-files are moved to /tmp/mon-cores in order to keep working directories clean.
Lets you specify more paths to search for corefiles. This option can be given multiple times.
Note that it's not expected to work properly the first couple of minutes after a new machine has been brought online or taken offline# mon check exectime [host|service] --warning=<min,max,avg> --critical=<min,max,avg>
Set the warning threshold for min,max and average execution time, in seconds Set the critical threshold for min,max and average execution time, in seconds# mon check latency [host|service] --warning=<min,max,avg> --critical=<min,max,avg>
Set the warning threshold for min,max and average execution time, in seconds Set the critical threshold for min,max and average execution time, in secondsCalculates a hash of all entries in the contact_access table. This is really only useful for debugging purposes. The check does not block execution of other scripts or checks.
Don't run this tool unless you're asked to by op5 support staff or told to do so by a message during an rpm or yum upgrade.An example command to add a new service comment for the service PING on the host foo would look something like this:# mon ecmd submit add_svc_comment service='foo;PING' persistent=1 author='John Doe' comment='the comment'Note how services are written. You can also use positional arguments, in which case the arguments have to be in the correct order for the command's syntactic template. The above example would then look thus:Fetches logfiles from remote nodes and stashes them in a local path, making them available for the 'sortmerge' command.If --fetch is specified, logs are first fetched from remote systems and sorted using the merge sort algorithm provided by the sortmerge command.
Runs the showlog helper program. Arguments passed to this command will get sent to the showlog helper.Runs a mergesort algorithm on logfiles from multiple systems to create a single unified logfile suitable for importing into the reports database.#mon node ctrl <name1> <name2> [--self] [--all|--type=<peer|poller|master>] -- <command>Execute <command> on the remote node(s) named. --all means run it on all configured nodes, as does making the first argument '--'.The first not understood argument marks the start of the command, but always using double dashes is recommended. Use single-quotes to execute commands with shell variables, output redirection or scriptlets, like so:Display all variables for all nodes, or for one node in a fashion suitable for being used as eval $(mon node show nodename) from shell scripts and scriptlets.Red text points to problem areas, such as high latency or the node being inactive, not handling any checks, or not sending regular enough program_status updates.Same as 'split', but use merlin's config to split config into configuration files suitable for poller consumptionSplits configuration based on merlin's peer and poller configuration and send object configuration to all peers and pollers, restarting those that receive a configuration update. ssh keys need to be set up for this to be usable without admin supervision.
All commands in this category can potentially overwrite configuration, enable or disable monitoring and generate notifications.Do NOT use these commands in a production environment.Tests various aspects of event forwarding with any number of hosts, services, peers and pollers, generating config and creating databases for the various instances and checking event distribution among them.Submits passive checkresults to the nagios.cmd pipe and verifies that the data gets written to database correctly and in a timely manner.
This command will disable active checks on your system and have other side-effects as well.VRRP can be used in this setup to have one DNS-name and one IP address that is primary linked to one of the master servers and if the primary master for some reason is unavailable VRRP will automaticly dectect this and send you to the secondary master.
If you already use VRRP in your network, make sure that you use the correct virtual_router_id.