26. May 2021

MONIToring network services

Simple small-scale service monitoring - Can Monit out-Nagios Nagios?

I have used Nagios for years now, corporately and for the handful of systems with which I am now involved.  It works well, within some annoyance limits, and once the hurdle of the initial setup is climbed, it runs merrily without much additional administration.  It is unfortunate that what started off as a fully free (libre and gratis) product now is presented as a libre "core" with implications of the real deal being the fully paid-for version.  So many free and open source projects seem to go to the wall once this type of business model is introduced, for reasons we don't need to go into here.  There are alternatives to Nagios, of course, like Icinga, which was an early fork, but the initial set up hurdle with Icinga, it seems to be, was even higher than with Nagios.  For  small scale, simple use, it seems overkill.

It occurred to me recently, though, that my use of Nagios is extremely simplistic. I just need to know if a handful of services on a handful of systems are running, to alert me if they are not, and to alert me if they come back on line.  In 99 cases out of 100, the problem is the network, usually my home ADSL connection, begrudgingly supplied by BT Openreach to offer as little in our rural areas as they can get away with (yes, I'm still bitter, thanks for asking....) My Nagios instance takes care of this, using few resources and doing an adequate job.

As I mentioned, there are annoyances. I don't like the defaults installation's web screen touching youtube.com when you fire it up, for example. There are other annoyances in the web interface too. But the way Nagios alerts for my particular circumstances is a little sub-optimal.  I have it set up to send me a jabber and tp email me on failures.  So when we have a big down or upload going on, Nagios alerts like mad.  Yes, it has flapping prevention built-in, but it's unreasonable to expect it to understand that there is a local connectivity issue.  None of these annoyances are killer issues, and, as I say, Nagios works well.  But...

I got to wondering if this could be done differently. In particular, whether Munin and/or Monit were still relevant.  It's many years since last I looked at these monitoring solutions.  The advantage of Munin is that it can be distributed, but that's not necessarily a big deal for me, as it's the external services I want to monitor. I must admit as well that I had a lot of trouble trying to get non-core plugins to work.  Some seemed to work but returned no results. I may go back to it when I feel like a little self-flagellation, but I quickly concluded that Munin was not a more straight forward alternative to Nagios, unless it was just a subset of local services you wanted to monitor.

Monit goes way back.  The developers now make money by selling a Monit aggregation system called M/Monit, which is reasonable, and rather more nuanced than that free "core" business model.  I understand M/Monit is not too expensive, but my requirements don't justify any spending, to be honest.  Could I get Monit to run roughly as I currently run Nagios, checking a series of services, on a series of hosts and alerting me via email and xmpp?

The answer is "Yes." (Well it would be, wouldn't it, or I'd not be writing this.)

It means using Monit slightly differently to most of the guides.  These days, for example, I am not convinced about the need for Monit's ability to restart services which have failed.  It seems to me far more important to ensure services don't fail in the first place, but one can see that there may be fragile services needing this option.

I already use the little sendxmpp utility to send jabber messages from the server. It just needs a .sendxmpprc file with suitable entries in the sending user's home directory and then has a very cli mail-like syntax. I'll not cover details here.

Monit's alerts are focussed on email. But it is also possible to execute a script when something happens, so instead of the configuration syntax being "alert", it becomes "exec <script>".  Good.  I also realised that it would be easier, given that I want to do email and jabber, not to rely on the built-in email, but send mail and jabber messages from within the script. I'll explain this later.

For comparison, here are screen shots of Nagios and Monit, looking after pretty much the same services.

The important similarity is not the detail, but the fact that all services are green.

What I also wanted, ideally, was if a service goes down, I'd like an occasional reminder that it is down, but not frequently.  I may also only want a given number of reminders.  I don't think that's possible with Nagios without acknowledging the service problem in the web front end.  I ended up with two scripts, which I put in /usr/local/bin. One is the alert script, and the other for service recovery.  Monit helpfully exposes a few pieces of information to help with this, but extracting that information seems to be a little clumsy, at least for my non-existent scriptwriting "skills". I have not been able to use the exposed variables directly, so I go through a little process of putting them into variables. In the examples given here, items in <> are my details, do you need to change these without the <>, of course.  Please contact me with a more elegant way of doing these two scripts, because, even for me, I think this is clumsy. Here is the alert script, suitably sanitised:

#!/bin/bash

MAILSENDTO="<your_email_alert_recipient>"
MAILFROM="<user you want mail to appear to come from>"
JABBERTO="<your_jabber_alert_recipient>"

#############################################
FILE=/tmp/${RANDOM}.mon

echo "$MONIT_HOST" >> $FILE
echo "$MONIT_EVENT" >> $FILE
echo "$MONIT_SERVICE" >> $FILE
echo "$MONIT_DESCRIPTION" >> $FILE

declare -a array=()
i=0

# reading file in row mode, insert each line into array
while IFS= read -r line; do
    array[i]="$line"
    let "i++"
    # reading from file path
done < $FILE
## var0=HOST
## var1=Event
## var2=Service
## var3=Desciption

rm $FILE

JABBER="Monit Alert - ${array[1]} for ${array[2]}. \n Monit ${array[3]}"
SUBJECT="Monit Alert - ${array[2]} ${array[1]}"

### need for details of sendxmpprc file
echo -e "$JABBER" | sendxmpp -f /root/.sendxmpprc -t -n "$JABBERTO"

### Now email too
echo -e "$JABBER" | mail -r "$MAILFROM" -s "$SUBJECT" "$MAILSENDTO"
##################################################

The recovery script is very similar. I have not tried to make the script parameter driven as that may complicated the monit config files.

#!/bin/sh
MAILSENDTO="<your_email_alert_recipient>"
MAILFROM="<user you want mail to appear to come from>"
JABBERTO="<your_jabber_alert_recipient>"

###############################
FILE=/tmp/${RANDOM}.mon

echo "$MONIT_HOST" >> $FILE
echo "$MONIT_EVENT" >> $FILE
echo "$MONIT_SERVICE" >> $FILE
echo "$MONIT_DESCRIPTION" >> $FILE

declare -a array=()
i=0

# reading file in row mode, insert each line into array
while IFS= read -r line; do
    array[i]="$line"
    let "i++"
    # reading from file path
done < $FILE
## var0=HOST
## var1=Event
## var2=Service
## var3=Desciption

#echo "Contents of file " >>  /root/test.txt
#cat $FILE >> /root/test.txt

rm $FILE

JABBER="System Recovery - ${array[1]} for ${array[2]}. \n Monit ${array[3]}"
SUBJECT="Monit Recovery - ${array[2]} ${array[1]}"
### need for details of sendxmpprc file
echo -e "$JABBER" | sendxmpp -f /root/.sendxmpprc -t -n $JABBERTO

## Now email
echo -e "$JABBER" | mail -r "$MAILFROM" -s "$SUBJECT" "$MAILSENDTO"

Now we get on to monit's configuration. Setttings can be done in the /etc/monitrc file, but it's easier to separate them out in separate files for each server or service I am monitoring.  I won't go through the monitrc file, but we need fill in virtually nothing there, because email isn't going to be used directly, Just set up the web interface, optionally with a username and password. This is conventionally a service running on port 2812. Don't do what I did an get yourself in knots because I was trying to access the service on port 2182, not 2812.

Under /etc/monit.d, we can set up the files.  Again, unlike monit's usual capabilities, I'm not interested in starting or stopping services, just alerting me to service availability.  So one system I monitor, which is accessible via openvpn, has this config file, sanitised, named <servername>.conf. Sorry about the long lines.

check host ACA_HTTP with address 192.168.xxx.xx
  if failed port 80 protocol http with timeout 30 seconds then exec "/usr/local/bin/monit-alert.sh" with repeat every 5 cycles
  else if succeeded 1 times within 1 cycles then exec "/usr/local/bin/monit-recover.sh"

check host ACA_SSH with address 192.168.xxx.xx
   if failed port 22 protocol ssh with timeout 30 seconds then exec "/usr/local/bin/monit-alert.sh" with repeat  every 5 cycles
   else if succeeded 1 times within 1 cycles then exec "/usr/local/bin/monit-recover.sh"

Another service conf file looks after the services I run from my home server. Again, it's sanitised, but by now you get the idea of what it's doing, just making sure the service is accessible, repeating the alerts, but not too frequently, and alerting on recovery.

check host homenet_IMAPS with address <FQDN>
    if failed port 993 protocol IMAPS with timeout 30 seconds then exec "/usr/local/bin/monit-alert.sh" with repeat every 10 cycles
    else if succeeded 1 times within 1 cycles then exec "/usr/local/bin/monit-recover.sh"

check host homenet_SUBMISSION with address <FQDN>
    if failed port 587 protocol SMTP with timeout 30 seconds then exec "/usr/local/bin/monit-alert.sh" with repeat every 10 cycles
    else if succeeded 1 times within 1 cycles then exec "/usr/local/bin/monit-recover.sh"

check host homenet_SMTP with address <FQDN>
    if failed port 25 protocol SMTP with timeout 30 seconds then exec "/usr/local/bin/monit-alert.sh" with repeat every 10 cycles
    else if succeeded 1 times within 1 cycles then exec "/usr/local/bin/monit-recover.sh"

check host homenet_SSH with address <FQDN>
    if failed port 1011 protocol ssh with timeout 20 seconds then exec "/usr/local/bin/monit-alert.sh" with repeat every 10 cycles
    else if succeeded 1 times within 1 cycles then exec "/usr/local/bin/monit-recover.sh"

check host homenet_Nextcloud with address <FQDN>
    if failed port 443 protocol https with timeout 20 seconds then exec "/usr/local/bin/monit-alert.sh" with repeat every 10 cycles
    else if succeeded 1 times within 1 cycles then exec "/usr/local/bin/monit-recover.sh"

One service is a sales web site. I want to be reminded more frequently about that:

check host RC-HTTPS with address <www.shop.site>
    if failed port 443 protocol https with timeout 20 seconds then exec "/usr/local/bin/monit-alert.sh" with repeat every 5 cycles
    else if succeeded 1 times within 1 cycles then exec "/usr/local/bin/monit-recover.sh"

check host RC-HTTP with address <www.shop.site>
    if failed port 80 protocol http with timeout 20 seconds then exec "/usr/local/bin/monit-alert.sh" with repeat every 5 cycles
    else if succeeded 1 times within 1 cycles then exec "/usr/local/bin/monit-recover.sh"
The alerts, when they come through, look like this, the example being jabber, but email is formatted in the same way:
[15:01:52] <jabber_address> Monit Alert - Connection failed for ACA_HTTP. 
Monit failed protocol test [HTTP] at [192.168.xxx.xx]:80 [TCP/IP] -- Connection timed out
While recovery messages are like this:
[15:02:55] <jabber_address> System Recovery - Connection succeeded for ACA_SSH. 
Monit connection succeeded to [192.168.xxx.xx]:22 [TCP/IP]

Ss all this making Monit do a Nagios-like task worth it? I'm not sure. Possibly, as using the above examples makes it a lot quicker and easier to set up.  The again, Nagios gives uptime histories and graphs and deep details about each particular connection, which is nice but hardly the type of thing you study. Is Monit a valid little way of alerting you to issues?  Most certainly yes.

Addendum:  Amazed at the success of hacking the cervlet.c files, I got carried away, and wondered if it was possible to display the response times, shown on the detail pages of each service, on the main page.  After a great deal of trial and error, as I know nothing about C-code,  I managed to get it right.  The main monitoring page now looks like this, and gives all the details Nagios used to give, on an easily-scanned single page.

Later I also added a little snippet of js to include a timestamp of the refreshed page - you can see it just below "Manager" in the header. I found I was unsure whether the page had refreshed, so adding this time stamp was handy. I also found my lack of C knowledge meant I had no chance of doing the job in C, but realised that the C code was spitting out HTML.

The new monit service has been used in anger, when one of the remote services I monitor went down. This was a strange event, as the remote system flipped its root to read-only, so, while the web server was still running, there was, so to say, no-one at home. The way monit is set up allowed me to trouble shoot the problem and speculate about the issue.

However, it did show that Nagios can still do one thing that monit can't. With monit, you can only disable monitoring while you work on the problem, while with Nagios, you can acknowledge the problem, and so stop getting peril messages, and the service is monitored again when it is restored. With monit you'd need to do those steps manually. Considering how much easier this is, it's no deal breaker.

Attached is my version of cervlet.c, the file which spits out the html etc, and which needs to be edited for the above to happen.  Monit compiled easily with these changes under OpenSUSE 15.2 on a Pi4.

10. April 2021

Lighttpd and Nextcloud 21+

An irritating but apparently harmless notice when updating to Nextcloud 21 needs a slight change to the web server configuration under Lighttpd

Continue reading

22. February 2021

OpenSUSE as a Raspberry Pi 4 Server

A work-in-progress series of notes on a migration to OpenSUSE as a server on a Raspberry Pi 4.  These notes are somewhat rambling, as they were written during the development process of the new server, but are left here, warts and all.


Migration project status: The project has been completed successfully. All services are now running under OpenSUSE as of 6th March 2021. I moaned and grumbled my way through this post, adding to it as issues arose or little triumphs were achieved. The reality is that what felt like little hacks, workarounds and customising I had to do, tiny things, really, were probably no more than were helping the stability of the previous system, but time and familiarity had made me forget that's how sysadmins make things work.
This has been an interesting exercise, and I plan to leave these notes here, both for my own reference and possibly as some might be useful to others.

One month later:  I remain pleased with the decision to migrate from Raspberry Pi OS to OpenSUSE for this server.  In fact, I wish I could easily migrate a hosted Raspberry Pi I have.  The most pressing issues I have faced were with regard to various weekly security and performance reports I have written over the years. I am still getting to grips with systemd's timers and logrotate, as some of the reports should run after the logs are rotated, but I am still not clear when these happen. One unexpected bonus is the degree to which btrfs compression works.  At the moment, there is a 25% space saving on the disk.  The next big test will be with the release of OpenSUSE 15.3. Previous experience on x86 suggests this update will be straight forward,  so I am expecting more of the solidity I am currently enjoying.

Continue reading

Omeka under Lighttpd

Documented for those wanting to try this out

Continue reading

16. January 2021

Our own xmpp server, with all the trimmings

We can bypass all the data slurping and other issues, by running our own messaging service, including voice and video, and group chats.

Continue reading

15. September 2020

The Cycles of History

We are living part of history. How we relate to that may be a key coping strategy in the years to come.

Continue reading

6. June 2020

Automatic initrd/initramfs creation for Raspian/Raspberry Pi OS

A very techie post on my blog, so I don't expect this to be of general interest. Try browsing the categories for something more interesting...

The perennial problem of ensuring your initrd or initramfs file has been correctly generated for the appropriate Pi architecture

USE THESE IDEAS AT YOUR OWN RISK!

Edit:The original script was not smart enough to manage the requirements of the Pi Zero (armv6l), but a helpful perl incantation found on a forum resolves this omission. The script has been in use for some time.

IMPORTANT - Please note that as of April 2021, Raspberry Pi OS has changed the way it generates initrds, so this script no longer works correctly.  I will document alterations here if I manage to understand the changes.

Continue reading

10. May 2020

A New Inverter for our Off-grid Power System

StevanPics-122920-10052020.jpg, May 2020

Thunderstorms at the beginning of the year prompted a desire to have a back-up inverter

Continue reading

4. April 2020

Nextcloud Talk and the missing piece of the puzzle

It worked, but wouldn't tell us, because of an obscure issue

Continue reading

31. January 2020

An Unexpected Day

A personal view of the decision to leave the Union of European Nations

Continue reading

12. January 2020

Power Problems - When Thunderstorms Strike

An off-grid life means taking responsibility when things go wrong

Continue reading

26. December 2019

Almost Synonyms - The linguistic families 6000 miles apart

Scots words and their Afrikaans equivalents have long fascinated me. Here are some.

Continue reading

3. December 2019

Tweeting the weather

Some digital glue automates tweeting the current weather conditions

Continue reading

21. September 2019

Peggy - 2003-2018

The life and times of a full-of-life Yorkie

Continue reading

18. July 2019

USB Scanner - scans a black page - Resolved

Under Linux, this scanner should Just Work, but doesn't. The fix is very simple

Continue reading

16. May 2019

Daily walks

Some recent pictures from our morning walks around Clachtoll and Stoer

Continue reading

25. February 2019

The Perennial Problem of the new Linux Laptop

Choices, choices, choices, some dead-ends, such as being far too expensive, some nice-to-haves, and finally some solutions.

Continue reading

1. December 2018

Nextcloud on a Raspberry Pi

There is a lot on the 'net about running your own sync and other cloud services with Nextcloud and the Raspberry Pi, many concluding it's not viable for continuous use, or many losing interest after a while. In fact, with some thought-through choices, a production environment for, say 5 or fewer users is perfectly feasible.


Continue reading

16. October 2018

Revisiting server-based antispam with Bogofilter

Some simple maintenance makes life interesting, but simplicity saves the day

Continue reading

2. October 2018

Latest update - off-grid Linux IT services

The latest (October 2018) update on hardware and software choices when your power supply is limited

This is a technical blog post which may not be of general interest, and assumes a certain level of technical understanding

Continue reading

- page 1 of 4