The image above graphs all the exceptions that are produced from Cassandra. The two big lines are
Transport Exceptions (te) - meaning that Cassandra could not answer the request think of this as MAX Connection errors in mySQL.
Unavailable Exceptions (ue) - meaning that Cassandra could answer the request but the "storage engine" cannot do anything with it because its busy doing something like communicating with other nodes or maintenance like a node cleanup.
So how did I get the graph to drop to 0? After looking at the error logs, I saw that Cassandra was getting flooded with SYN Requests and the kernel thought that it was a SYN Flood and did this
possible SYN flooding on port 9160. Sending cookies.
To stop this the puppet profile was changed to have
sysctl -w net.ipv4.tcp_max_syn_backlog=4096
sysctl -w net.ipv4.tcp_syncookies=0
Next looking into the Cassandra log which I defined to exist in /var/log/cassandra/system.log
WARN [TCP Selector Manager] 2010-03-26 02:46:31,619 TcpConnectionHandler.java (line 53) Exception was generated at : 03/26/2010 02:
Too many open files
java.io.IOException: Too many open files
Then noticed that
ulimit -n == 1024
thus I changed
/etc/security/limits.conf so that It's at a server setting by adding this:
* - nofile 8000
Now my Transport Exceptions and Unavailable Exceptions are gone and data is being written to it consistently.
There are many other ways of doing the same thing, I could have modified my init script or did some other stuff but I choose this way. Default Distros set kernel and limits fields too low: settings for desktop levels.