Monitoring drupal with nagios

So why would you monitor a drupal site?

From a System Engineers point of view, Drupal itself is nothing else than a set of software that requires updates on a regular base. So what do you do? You treat it that way and update it whenever updates are needed and patch it as soon as security updates are available. The more tricky part is to always get notified immediately when security updates are required so you can patch all of your sites before shit hits the fan.

Drupal itself offers a way to notify a site admin about regular updates and/or security updates by sending an email to a specified account. If you are only responsible for a couple of Drupal sites, this will be sufficient. However, if you have to keep track about lots of other stuff and might have chosen Nagios as a central platform for monitoring different servers and services it would be useful to go the same way with Drupal and keep everything in the same location.

What about the drupal nagios module?

I know there is a drupal nagios module out there, but this has to be added to drupal itself and activated in the module section. Unfortunately I do not have access to all site’s code and for a few sites I have no permission to add modules to it, so there had to be a different solution.

DIY if you have to

I was facing the above stated problems. In my company I use nagios/icinga to keep track of many servers with various services and states. The only remaining relict were numerous Drupal sites that were sending emails from time to time complaining about security updates. This way I had to keep an eye on two different systems. As I am a big fan of consolidation and could not find a suitable plugin for Drupal sites, I had to write my own.

The thoughts

As I have already written a few nagios plugins I was quite confident and set my goals high. I wanted the normal update notification as well as the security update only notification. With these two checks I would be on the same height as the Drupal functionality itself, so I was looking deeper into what else can go wrong on a drupal site:
* Pending database updates
* Incorrect file permissions on directories
* Problems with cron
* Basically everything the drupal status report can complain about

The next thought was about the technology to use. Am I going to write in PHP and make use of drupal hooks to get all the required information? This idea was quickly refused as I currently have drupal 6 and 7 systems and in the future also drupal 8 ones. So it had to be something that is compatible between all versions.

The idea came to me when I was checking a page against errors using drush. Drush is an already matured toolset for drupal, so why not just write a wrapper that gives me all desired information in a nagios plugin style output.

The check_drupal plugin – first draft

The goals were set, the technology was decided, I was ready to go and this is what I came up with in pure posix compliant bourne shell (not bash):

check_drupal -d <drupal root> [-n <name>] [-s <w|e>] [-u <w|e>] [-e <w|e>] [-w <w|e>] [-m <w|e>]

The checks are as follows:
* -s: security updates
* -u: normal updates
* -e: all core errors (what the status report shows as errors)
* -w: all core warnings (what the status report shows as warnings)
* -m: missing/pending database updates

As well as the ability to specify the nagios severity (w warning and e error) for every single check.

Problems with the first draft

I was testing it back and forth and everything went smooth. After a few more days of testing and performance optimization in the wrapper script I noticed that the check itself can take up to three seconds to execute on some drupal instances (depending on the server and the database size of drupal).

The check_drupal plugin – final version

Nagios itself checks every 5 minutes. 3 seconds for a check that is run every 5 minutes is pretty long, so I had to reconsider in order to not waste too much time on those servers. The other idea was that I do not need to check for problems every five minutes. So, this is what I came up with:

check_drupal -d <drupal root> [-n <name>] [-s <w|e>] [-u <w|e>] [-e <w|e>] [-w <w|e>] [-m <w|e>] [-l <logfile>]
check_drupal_log -f <logfile>

There is an additional parameter -l that will log all check results (including nagios exit codes) into a logfile. With this additional option, the check_drupal script can run on the drupal machine via cron and update the logfile every 6 or 12 or XX hours. The second plugin check_drupal_log will actually be triggered by nagios and parse the logfile which only takes milliseconds.

The choice

As you notice the -l option is optionally and you can use it either way. Either you just use check_drupal or you can use the combination of both to save some cpu cycles.

The end

This is my first contribution to drupal, even though it is not directly related to any drupal modules it is a nice addition. I hope you enjoy it and if you find a bug report it and I am happy to fix it.

Find the source with install instructions on github:

cytopia/check_drupal

Update

Officially added to:
* Icinga Exchange
* Nagios Exchange

Everything CLI

check_drupal: Monitoring drupal with nagios