Notes about open source software, computers, other stuff.

Month: September 2010

Nagios event handlers for services on remote machines

Part of my work consists of managing the servers on which we do our data analysis. At the moment we’ve got two servers and one virtual machine running. The VM is used as a management server, it runs things like Nagios, Cacti, Subversion, etc.

Today I implemented Nagios event handlers in this setup. The idea behind an event handler is the following: If e.g. a service goes down, Nagios should try to solve this problem itself before notifying the administrator (me). It should, in this case, simply try to restart the service.

The Nagios documentation [1] describes how to do this for a service that runs on the same machine as the Nagios service. In my case, however, the services are running on the to real servers. To me it seemed logical to use NRPE to execute the necessary commands on the remote hosts (since NRPE was already running on those machines anyway).
In order to adapt the scheme from the Nagios docs to work on remote servers as well three things need to be done:

  • The command that is executed by the event handler script should be changed to use NRPE
  • On the remote machine the nagios user (under which the NRPE service is running) should be given some sudo rights so that it is actually allowed to start a service.
  • The NRPE configuration on the remote machine should of course be changed to include the new command(s) for starting services.

So here we go! First, the Nagios configuration on the management host. In the service definition file I added one line for the event handler to each service. The definition of one service now looks like this (the last line was added):

define service {
       use                      generic-service
       hostgroup_name           sge-exec-servers
       service_description      SGE execd
       check_command            check_nrpe_1arg!check_sge_execd
       notification_interval    0 ; set > 0 if you want to be renotified
       event_handler            restart-service!sge-execd
}

Next, the restart-service command must be defined. I did that in a file that I called /etc/nagios3/conf.d/event-handlers.cfg:

define command {
       command_name     restart-service
       command_line     /etc/nagios3/conf.d/event_handler_script.sh $SERVICESTATE$ $SERVICESTATETYPE $ $SERVICEATTEMPT$ $HOSTADDRESS$ $ARG1$ $SERVICEDESC$
}

The variable $ARG1$ here is the name of the service that needs to be restarted. In this example it is sge-execd from the event_handler line in the service definition. The $HOSTADDRESS will be used in the event handler script to send the right host name to NRPE.
The event_handler_script.sh referenced here is almost identical to the one in the Nagios documentation. As mentioned in the plan above, I changed it slightly so that it uses NRPE.

#!/bin/sh                                                                                            
#
# Event handler script for restarting the nrpe server on the local machine
# Taken from the Nagios documentation and
# http://www.techadre.com/sites/techadre.com/files/event_handler_script_0.txt
# Adapted by L.C. Karssen
# Time-stamp: <2010-09-14 15:24:33 (root)>
#
# Note: This script will only restart the nrpe server if the service is
#       retried 3 times (in a "soft" state) or if the web service somehow
#       manages to fall into a "hard" error state.
#
 
date=`date`
 
# What state is the NRPE service in?
case "$1" in
OK)
        # The service just came back up, so don't do anything...
        ;;
WARNING)
        # We don't really care about warning states, since the service is probably still running...
        ;;
UNKNOWN)
        # We don't know what might be causing an unknown error, so don't do anything...
        ;;
CRITICAL)
        # Aha!  The BLAH service appears to have a problem - perhaps we should restart the server...
 
        # Is this a "soft" or a "hard" state?
        case "$2" in
 
        # We're in a "soft" state, meaning that Nagios is in the middle of retrying the
        # check before it turns into a "hard" state and contacts get notified...
        SOFT)
                # What check attempt are we on?  We don't want to restart the web server on the firs\
t
                # check, because it may just be a fluke!
                case "$3" in
 
                # Wait until the check has been tried 3 times before restarting the web server.
                # If the check fails on the 4th time (after we restart the web server), the state
                # type will turn to "hard" and contacts will be notified of the problem.
                # Hopefully this will restart the web server successfully, so the 4th check will
                # result in a "soft" recovery.  If that happens no one gets notified because we
                # fixed the problem!
                3)
                        echo -n "Restarting service $6 (3rd soft critical state)...\n"
                        # Call NRPE to restart the service on the remote machine
                        /usr/lib/nagios/plugins/check_nrpe -H $4 -c restart-$5
                        echo "$date - restart $6 - SOFT"  >> /tmp/eventhandlers
                        ;;
                        esac
                ;;
 
        # The service somehow managed to turn into a hard error without getting fixed.
        # It should have been restarted by the code above, but for some reason it didn't.
        # Let's give it one last try, shall we?
        # Note: Contacts have already been notified of a problem with the service at this
        # point (unless you disabled notifications for this service)
        HARD)
                case "$3" in
 
                4)
                        echo -n "Restarting $6 service...\n"
                        # Call the init script to restart the NRPE server
                        echo "$date - restart $6 - HARD"  >> /tmp/eventhandlers
                        /usr/lib/nagios/plugins/check_nrpe -H $4 -c restart-$5
                        ;;
                        esac
                ;;
        esac
        ;;
esac
exit 0

Now Nagios can be restarted and should continue its work as usual. Time to make the changes on the remote hosts.

First, we’ll grant the necessary sudo rights to the nagios user. Run visudo and add these lines:

## Allow NRPE to restart sevices
User_Alias NAGIOS = nagios,nagcmd
Cmnd_Alias NAGIOSCOMMANDS = /usr/sbin/service
Defaults:NAGIOS !requiretty
NAGIOS    ALL=(ALL)    NOPASSWD: NAGIOSCOMMANDS

And finally add the required lines in the NRPE config file (/etc/nagios/nrep.cfg):

command[restart-sge-execd]=/usr/bin/sudo /usr/sbin/service gridengine-exec start

Restart the NRPE daemon and it should all work. Test it by manually stopping the service.

[1] Nagios documentation on Event Handlers
[2] Two blog posts that describe a similar set up. I used these as a starting point for my own set up.

Related Images:

Lenovo Thinkpad X100e and Ubuntu 10.04

About a month ago I bought a Lenovo Thinkpad X100e laptop. Well, maybe laptop is a bit too big a word for it. Size-wise it’s more like a netbook with its screen diagonal of 11.6″. Performance-wise however, it’s much better. The one I’ve got has an AMD Turion Neo X2 L625 dual core processor running at a maximum of 1.6GHz and 2GB of RAM. It’s a nifty little machine that serves my needs: doing some work on the train to and from work, or while being on conferences.

I took quite some time to look around for a laptop like this, and this Thinkpad seems to be the only one that satisfies my minimum requirements:
– Matte screen; no glossy screens for me, I’ve already got a mirror in my bathroom :-).
– Trackpoint; yep, that’s the red dot in between the G, H, and B keys.
– A processor that was more powerful than Intel’s Atom
– A decent keyboard, because for me, using Linux means using the command line and Emacs a lot.

After several weeks of use I’ve found only one drawback to this machine: it’s processor is not that efficient. It uses quite some power and therefore gets a bit hot. As a result the fan runs a lot (even though it’s not that audible) and battery life is not too good. I’m getting approximately 2 to 3 hours out of it if I reduce the screen brightness and turn wifi off. That could have been better (maybe Lenovo should have used an Intel CULV processor?), but it’s not too much of a limitation. But this came at no surprise, most reviews on the web mention it.

After opening the box I quickly made an image of the Windows partitions that were on it and then proceeded to install Ubuntu 10.04 on it. Most of the hardware was recognised by the 2.6.32 kernel included with Ubuntu’s 10.04 release. However, as several blogs (see links below) pointed out there are a few bumps, e.g. with suspend and resume, or the wireless chip that is able to connect, but doesn’t want to send or receive data. The bumps were smoothed out by installing a newer kernel (2.6.35-12-generic) from the Ubuntu kernel PPA. The 2.6.35 kernel is the one that will be used in the next Ubuntu release and the PPA contains packages that make this kernel run in the present release as well. With that kernel, suspend and hibernate run well, as well as most Fn function keys. In fact, the only one that doesn’t seem to work is Fn+F3 for microphone mute. I had to turn on the bluetooth module in Windows before it showed up in Ubuntu (as noted by several blogs). At the moment, the things that don’t work correctly are:
– The microphone doesn’t record (neither in the sound recorder, nor when using Skype). Sometimes it shows some activity if the mic-volume slider is moved to about 25%, but I couldn’t get that to work reliably.
– The combined mic/headphone jack doesn’t mute the speakers if a pair headphones is plugged in (neither is any sound heard through the headphones).
Maybe a newer ALSA release in the upcomming Ubuntu 10.10 will remedy these problems.

I was pleasantly surprised by the fact that using the open source radeon driver (installed by default) for the AMD/ATI graphics card worked out of the box, including Compiz 3D desktop fancy stuff. The VGA out also worked perfectly when I hooked it up to my Sony Bravia TV. Xorg’s RandR detected it and I could choose between an extended desktop or a clone setup.

As I already mentioned, I’m a trackpoint user, so I wanted to disable the touchpad, especially since the two buttons for it are located at the front edge of the laptop and are easily pressed when the device sits on your lap and you’ve got your knees pulled up.
Secondly I enabled wheel emulation for the trackpoint. Now, if I click and hold the middle ‘mouse’ button and push the trackpoint in a certain direction it acts as a scroll wheel. To achieve this I created the file /usr/lib/X11/xorg.conf.d/20-thinkpad.conf (EDIT: for Ubuntu 10.10 this file should be located in /usr/share/X11/xorf.conf.d/) with the following contents:

Section "InputClass"
	Identifier "Trackpoint Wheel Emulation"
	MatchProduct "Trackpoint"
	MatchDevicePath "/dev/input/dev*"
	Driver "evdev"
	Option "EmulateWheel" "true"
	Option "EmulateWheelButton" "2"
	Option "Emulate3Buttons" "3"
	Option "XAxisMapping" "6 7"
	Option "YAxisMapping" "4 5"
EndSection	

All in all I’m very happy with the X100e. It’s a small but sturdy laptop with an excellent screen and a great keyboard.

Some links:
An excellent review of the Lenovo Thinkpad X100e
A recent review at AnandTech
Ubuntu kernel PPA
ThinkWiki page for the X100e, has lots of info on running Linux on this laptop.
A blog about installing Ubuntu Linux on the X100e, the problems mentioned in that post and its comments have now been solved (if you install the 2.6.35 kernel from the PPA). I tried the gpointing-device-settings package for some time (to get the trackpoint scroll functionality to work), but its settings didn’t survive across reboots or even after hibernating, so I removed it again.

Related Images:

Linux, the Logitech Trackman Marble and emulating a scroll wheel

At work I recently came across a trackball. It was about to be thrown away and since I’d never really used one I decided to take it home and try it out. It’s a Logitech Trackman Marble, still for sale on Logitech’s website.

The trackball features four buttons: two large ones for left and right-clicking and to smaller ones that work as back and forward buttons in Firefox, for example.

After plugging it into my PC it was instantly recognised by X (I’m using Ubuntu 10.04). There’s no middle mouse button, but that can be emulated by clicking the left and right mouse buttons at the same time (something I’ve been use to on older laptops, and, well, even from the time that some of the mouses I owned only had two buttons). However, I did miss my scroll wheel. A quick search on the Internet brought me to Rob Meerman’s website where he explains a lot about the Trackman and how it works in X. He even has a special section on Ubuntu 10.04. In short it comes down to these commands:

xinput set-int-prop "Logitech USB Trackball" "Evdev Wheel Emulation Button" 8 8
xinput set-int-prop "Logitech USB Trackball" "Evdev Wheel Emulation" 8 1

Unfortunately the changes made by these commands are not persistent across reboots. I’ll try to fix that later.

EDIT: To add middle mouse button emulation and horizontal scrolling (thanks to rejistania below) run:

xinput set-int-prop "Logitech USB Trackball" "Evdev Middle Button Emulation" 8 1
xinput set-prop "Logitech USB Trackball" "Evdev Wheel Emulation Axes" 6 7 4 5

END EDIT

Regarding the use of a trackball compared to an ordinary mouse my experiences so far have been very positive. It didn’t take me a lot of time to get used to it. Also precision placement of the pointer doesn’t seem to be more difficult that with a regular mouse. So for now my wireless Logitech mouse can take a holiday :-). The nicest think about the trackball is the fact that you don’t have to move the whole device. So it’s less ‘weight lifting’. Also, the fact that the ball (in combination with the small button) is the scroll wheel, makes for a relatively heavy wheel without much friction, so scrolling large distances can simply be done by giving the ball a good spin. Nice!

Related Images:

© 2024 Lennart's weblog

Theme by Anders NorĂ©nUp ↑