For the purpose of this article, I am going to talk about handling events such as a clearing up swap.
First, let us look at some configuration of Nagios. We are going to define a command, then service acting on that command. Let us assume that the nagios install is in /usr/local/nagios.
Therefore, in /usr/local/nagios/ a few configuration files are key:
- /usr/local/nagios/etc/objects/commands.cfg - the command file where the checks are defined
- /usr/local/nagios/etc/hosts/*/hosts.cfg - the services file where the checks are defined for execution based on other directives in this file.
A command:
# 'check_local_swap' command definition
define command{
command_name check_local_swap
command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$
}
This says that check_local_swap executes check_swap with a warning threshold of $ARG1 and a critical threshold or $ARG2
Next when defining a service for a host
define service{
use generic-service; Name of service template to use
host_name dbfacebook34b ; hostname
service_description SYS:Swap ; what shows up in alerts
is_volatile 0
check_period 24x7 ; threshold when to check (all the time)
max_check_attempts 4 ; threshold to check before marking state
event_handler handle-swap ; handle an event (another command)
normal_check_interval 5 ; in seconds
retry_check_interval 1 ; only try once before reporting the state
contact_groups itops ; contact group to send notifications to
notification_options w,u,c,r ; need to look this up for all defs
notification_interval 600 ; retry sending notifs every 8 mins
notification_period 24x7 ; keep sending them
check_command check_nrpe!check_local_swap!80%!55% ; execute the event handler and warn like hell
}
Lots of goodies as you can see. Let us look at the event handler
define command{
command_name handle-swap
command_line /home/scripts/handle_swap.pl
}
This means execute this script whenever any event for swap occurs (I decided to make this simple and not put a threshold on this).
What does handle_swap.pl do - well it’s a perl script that looks at free memory and if only a few 100K of swap is in use, swapoff -a; swapon -a;
In this case, it is a bit safe to do this. Why do this? Why not just turn of swap. I have talked in depth about this subject-but for a minor recap. Linux needs swap else, kswapd will freak out. Swap in DB's is bad so I clean it up automatically since O_DIRECT on my SAN is not an option.
Why not just run a cron job? Nagios keeps a log, I like to review what is happening from a central location, and nagios is freaking COOL.
4 comments:
Linux without swap works fine and always did. No idea where that "freak" came out.
How did you configure linux without swap. Every time I turn off swap, kswapd chews up a ton of CPU resources, so much so it puts mysql in a run queue.
Nice write up !
Linux without swap can work, but it'll depend on your workload, amount of memory, kernel version...
Apart from the cool factor, and the obvious educational interest, this solution looks a bit complex for a production system - indeed using cron seems (to me) like a better (simpler, less dependencies) choice. If you want central "reviewability" (did I just made that word up ?) use syslog (which in production should always be configured to log to a central node anyway, right ?) !
Related also: the --memlock option and the /proc/sys/vm/swappiness tunable.
/proc/sys/vm/swappiness is just a "suggestion" to the kernel, setting it to 0 doesn't prevent the OS from swapping... memlock produces very unpredictable results; it's a roll of the dice :) Large pages might be something interesting to try next ...
Post a Comment