Friday, September 03, 2010

Cassandra and Ganglia

cassandra_tpstats_row_read_stage_completed

I finally got some time to do some house cleaning. One of my nagging low-hanging fruit jobs was to stop using jconsole as my monitor. I created a ganglia script to graph what is above. The image illustrated above I am showing all the Cassandra servers and their total row read stages completed in the last hour as a gauge. In essence I am graphing the delta of the change between ganglia script runs.

How I have it set up is:

All data exposed by JMX to produce tpstats and cfstats is graphed via ganglia. The pattern for each graph is as follows

cass_{stat_class}_{key}

stat_class - tpc, tpp, tpa means complete, pending, active respectively
key - would be message deserialization for instance.

For column family stats I graph the keyspace stats as well as the specific column family stats exposed by cfstats. For instance below:

Cassandra cfstats with ganglia

If you’re interested in the scripts I'll send it to you or put it up on code.google.com, its written in perl OOP perl and takes the same approach of packaging that maatkit tool kit for mySQL by Xarb and crew does (puts all the "classes" in the file as the application).

GmetricDelegate is the parent package
GmetricCassandra extends GmetricDelegate and overloads getData as well as defines what is an absolute stats vrs a gauge.

As you can see the pattern I also have
GmetricInnoDB
GmetricMySQL

and so on.

then on each server I run

/usr/bin/perl -w /home/scripts/ganglia_gmetric.pl --module=GmetricCassandra

this then talks to Ganglia through gmetric to report the stats.

Update: I uploaded an alpha version to http://code.google.com/p/gangliastats/ - be warned sparse comments I'll have another check in with documentation soon.

No comments: