For the purpose of this article, I am going to talk about handling events such as a clearing up swap.
First, let us look at some configuration of Nagios. We are going to define a command, then service acting on that command. Let us assume that the nagios install is in /usr/local/nagios.
Therefore, in /usr/local/nagios/ a few configuration files are key:
- /usr/local/nagios/etc/objects/commands.cfg - the command file where the checks are defined
- /usr/local/nagios/etc/hosts/*/hosts.cfg - the services file where the checks are defined for execution based on other directives in this file.
A command:
# 'check_local_swap' command definition
define command{
command_name check_local_swap
command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$
}
This says that check_local_swap executes check_swap with a warning threshold of $ARG1 and a critical threshold or $ARG2
Next when defining a service for a host
define service{
use generic-service; Name of service template to use
host_name dbfacebook34b ; hostname
service_description SYS:Swap ; what shows up in alerts
is_volatile 0
check_period 24x7 ; threshold when to check (all the time)
max_check_attempts 4 ; threshold to check before marking state
event_handler handle-swap ; handle an event (another command)
normal_check_interval 5 ; in seconds
retry_check_interval 1 ; only try once before reporting the state
contact_groups itops ; contact group to send notifications to
notification_options w,u,c,r ; need to look this up for all defs
notification_interval 600 ; retry sending notifs every 8 mins
notification_period 24x7 ; keep sending them
check_command check_nrpe!check_local_swap!80%!55% ; execute the event handler and warn like hell
}
Lots of goodies as you can see. Let us look at the event handler
define command{
command_name handle-swap
command_line /home/scripts/handle_swap.pl
}
This means execute this script whenever any event for swap occurs (I decided to make this simple and not put a threshold on this).
What does handle_swap.pl do - well it’s a perl script that looks at free memory and if only a few 100K of swap is in use, swapoff -a; swapon -a;
In this case, it is a bit safe to do this. Why do this? Why not just turn of swap. I have talked in depth about this subject-but for a minor recap. Linux needs swap else, kswapd will freak out. Swap in DB's is bad so I clean it up automatically since O_DIRECT on my SAN is not an option.
Why not just run a cron job? Nagios keeps a log, I like to review what is happening from a central location, and nagios is freaking COOL.
Linux without swap works fine and always did. No idea where that "freak" came out.
ReplyDeleteHow did you configure linux without swap. Every time I turn off swap, kswapd chews up a ton of CPU resources, so much so it puts mysql in a run queue.
ReplyDeleteNice write up !
ReplyDeleteLinux without swap can work, but it'll depend on your workload, amount of memory, kernel version...
Apart from the cool factor, and the obvious educational interest, this solution looks a bit complex for a production system - indeed using cron seems (to me) like a better (simpler, less dependencies) choice. If you want central "reviewability" (did I just made that word up ?) use syslog (which in production should always be configured to log to a central node anyway, right ?) !
Related also: the --memlock option and the /proc/sys/vm/swappiness tunable.
/proc/sys/vm/swappiness is just a "suggestion" to the kernel, setting it to 0 doesn't prevent the OS from swapping... memlock produces very unpredictable results; it's a roll of the dice :) Large pages might be something interesting to try next ...
ReplyDelete