Friday, September 25, 2009

NTP event-handler in Nagios

Nagios, "Nagios Ain't Gonna Insist On Sainthood"....What an awesome tool for systems administration. I have been using Nagios since the days of netsaint back when it was first released in 1999. In 10 years time, the program has matured and become a de-facto standard in Network and Systems Monitoring. In recent years there have been many spin-off's of nagios, but I have chosen to stick with the core package.

One of the most over-looked, but very powerful, features of Nagios is it's ability to use event handlers (Nagios Docs). An Event Handler allows for a script to be executed based on changes to the service/host system state. There are endless possibilities to using this feature. Usually when Nagios alerts to a changed system state (warning/critical), an administrator is emailed/paged/tweeted in response to the system state change. The said administrator then logs in, restarts services, checks network connection, finds out why disk space went critical, so on and so forth. Event-Handler's can do this and so much more!

I have written numerous event-handler scripts that creates a 'self-healing' environment for different servers utilizing event-handlers and NRPE. Some of the Event handlers that I utilize in Nagios are the following:

*Disk Space Critical - When Nagios alerts to a warning/critical disk space, NRPE executes a custom du-scan.sh script that sorts all of the data on the mount point by highest ammt used, puts it into a log file on the /tmp directory and emails the location of the log to administrators

*CPU Load Critical - When Nagios alerts to a warning/critical CPU Load, whether it's in linux or windows, a script is executed (Bash in linux, VB in windows) that emails administration the top 5 running CPU process's on the server

*NTP (Network Time Protocol) time sync Critical - Sometimes when CPU load goes critical, the NTP service running on a Linux machine goes WAY out of sync (over 1000seconds) causing the NTP Daemon to crash. When the time is off on the server, the various services we use, report different times, causing even more issues. The fix to this, is to have Nagios restart the ntp service on the remote server via NRPE.

This little blog entry will detail how to setup an NTP event-handler for Nagios. You can use this as a base for just about anything else event-handler related.

This is written with the assumption that the person using this is already familiar with, and has a Nagios Server utilizing NRPE for linux clients, along with having nagios-plugins installed on the client. This isn't a basic 'nagios howto' at all.

First things first, setting up and configuring the remote client for allowing command arguments.

1. Navigate to your nrpe source directory (in this case: /root/Download/nrpe2-12/
2. Reconfigure nrpe for command arguments
./configure --enable-command-args && make && make install
3. Modify the nrpe.cfg file
a) change: dont_blame_nrpe=0 to: dont_blame_nrpe=1
b) add the following to your command arguments:
## /usr/local/.... is the path to the check_ntp plugin that comes with nagios plugins, change 0.pool.ntp.org to the ntp server that your organization uses to get ntp data. Warning at 10 seconds, critical at 20 seconds.

command[check_ntp]=/usr/local/nagios/libexec/check_ntp -H 0.pool.ntp.org -w 10 -c 20

## /usr/local/.... is the path to the event-ntp handler, as seen below.

command[event-ntp]=/usr/local/nagios/libexec/event-ntp $ARG1$ $ARG2$ $ARG3$

4. create a file called event-ntp, u:g of nagios:nagios, set executable.
5. Drop this code into the event-ntp file:

#!/bin/bash

## This is an event handler that will be executed on Warnings and Critical alerts.
## On Warnings, an ntp query will be issued, and the email will be sent to the specified admin
## On Critical, an ntp query will be issued, and the ntpd service will be restarted to re-sync the clocks

case "$1" in
OK)
;;
WARNING)
echo -e "Running NTP Query" "\n"
ntpq -p | mailx -s "HOSTNAME - NTP Query" adminacct@example.com
;;
UNKNOWN)
;;
CRITICAL)

case "$2" in
SOFT)
case "$3" in
3)
echo -e "Running NTP Query & Restarting NTP Service" "\n"
ntpq -p | mailx -s "HOSTNAME - NTP Query - Restarted NTPD" adminacct@example.com && /usr/bin/sudo /sbin/service ntpd restart
;;
esac
;;
HARD)
echo -e "Running NTP Query & Restarting NTP Service" "\n"
ntpq -p | mailx -s "HOSTNAME - NTP Query - Restarted NTPD" adminacct@example.com && /usr/bin/sudo /sbin/service ntpd restart
;;
esac
;;

esac
exit 0


Be sure to change HOSTNAME and adminacct@example.com to the client's hostname, and the admin account that you want to email to go to.

Now, the tricky, and probably not the most secure way to do this, is to modify the sudoer's file to allow the nagios user to execute system commands. I'm sure there is a more 'secure' way of doing this, but this works for me.

1. visudo
2. add the following:
User_Alias NAGIOS = nagios,nagcmd
Cmnd_Alias NAGIOSCOMMANDS = /sbin/service
Defaults:NAGIOS !requiretty
NAGIOS ALL=(ALL) NOPASSWD: NAGIOSCOMMANDS

Be sure to restart the nrpe client after all this has been accomplished. Now to move onto the server end of things.



First thing you need to do is create an event-ntp command in the commands.cfg file:

define command{
command_name event-ntp
command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTNAME$ -c ev
ent-ntp -a $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$
}

This will be called by the event-handler object in your configuration file.


now, modify your service description in wherever you configure your service/host definitions. In my case I have a separate configuration file called linux.cfg.

define service{
use YOUR-service-TEMPLATE
host_name HOSTNAME-HERE
service_description Time Sync Check
event_handler event-ntp
check_command check_nrpe!check_ntp
}

Now restart nagios (service nagios restart), and test the configuration from the server end:
/usr/local/nagios/libexec/check_nrpe -H REMOTEHOSTNAME -c event-ntp -a CRITICAL HARD

If all goes well, you should receive an email from your client with an output from ntpq -p, and an ntpd service restart.

If you have any problems, not receiving email, or not executing the said script, set the debug level=1 on nrpe.cfg, restart nrpe, execute the above event-ntp test, and check your logs.

As you can see, it's not too difficult to execute event-handler scripts, and saves Administrator's time when nagios can do the leg-work on system/host critical alerts. This example 'self-heals' the NTPd service, but can be used/modified to just report data when there is a problem. Any time that nagios can do self-automation/testing, before administrators get to the machine, shaves time off of troubleshooting a problem.

Special Thanks to keith4 on freenode.net's #nagios channel for catching a syntax error for my $HOSTNAME$ argument in my commands.cfg file. Thus saving me many hours of hair pulling and name calling.

Monday, September 21, 2009

OpenSimulator

Last week I was tasked with attempting to create a 'virtual world' for a training environment. Full on PowerPoint Slides, VOIP, etc in a training room type environment. Well, as I have never really played with SecondLife, or any other type of Virtual World, I did some digging.

It looks like VirtualWorld environment's are the next big thing (and they still are 8+ years later since SecondLife came out). More and more I read articles online how big companies such as IBM, Sun and HP are using SecondLife (and their own internal Virtual World Servers) to host conferences, webcasts, and training sessions for their staff and the public.

As a matter of fact, not only are companies using Virtual Worlds to host various public functions; but also creating 'Virtual Data Centers'. Case in point, IBM's Virtual Data Center. IBM has special software that allows System's Administrators to monitor and administer their Data Centers, in a virtual environment.

So, while surfing around the internet for VirtualWorld servers, I came across OpenSimulator. You can create a stand-alone server, or attach it to other virtualworlds through osgrid.org. As the developers do their testing in Ubuntu, I downloaded and installed the latest Ubuntu Server and started up OpenSimulator. The wiki page located on their website is pretty self-explanatory, walking you through step-by-step on setting up your own virtual world.

The one problem that I did come across, was using either the Hippo Viewer or SecondLife Client to connect to the Instance. Both Hippo Viewer and SecondLife Client had my avatar as a 'ghost'. I found that other people had these problems, and the solution was to create hair for your avatar, and the ghost would go away. Strange Quirk, but it worked!

I wouldn't have been able to get this setup and troubleshooted without the excellent help from freenode.net's #opensim channel. They were a big help, even at 2200hrs MST.

Anyway, now that I have my opensim server running in Ubuntu, on a virtualbox; it's time to create some land!

Until next time.....

Tuesday, September 15, 2009

Linux Screen template

It's been awhile since I have posted anything, but hopefully I will start posting little tidbits of helpful *nix stuff again.

This time, the post is about Linux Screen templates. With more and more linux Distro's going GUI, and trying to get away from the CLI; the screen program has been mostly overlooked. As I have been working in and around linux for the last
12 years or so, I have become quite fond of the screen application.

First things first: man screen
Everything you need to know about custom .screenrc 's is located in the man page, lo and behold, many people don't realize this.

To create a .screenrc that I use every day:

touch /home/username/.screenrc (or /root/.screenrc for root's screen)
vi .screenrc
hardstatus alwayslastline "%{=b}%{G} Screen(s): %{b}%w %=%{kG}%C%A %D, %M/%d/%Y"
startup_message off
msgwait 1

Save and quit the editor, and fire up screen. This will show all of the screens that you have open (Name each one you create with: Ctrl-a A), and puts in a date/time stamp in the lower right hand corner. I put this in because I get so busy that I hardly look up at the clock and end up missing lunch. This way I can easily look over to the right to check my time :-)

As I am using a GIANT monitor, my screen capture of what my screen session looks like here in Blogger, seems to be all wacky. You should be able to click on the image to view it.




As I said, I live by screen for server work. I can create multiple screens, split them Horizontally or Vertically into 1 main screen, detach a screen when I go home, VPN in and re-attach the said screen from another location. And because I spend most of my time in a console, having a custom screenrc file, just make sense.