Friday, March 26, 2010

Some kernel tweaks to aid Cassandra under a high concurrency environment

For the past couple of weeks I have been trouble shooting some Cassandra issues where data would not make it to Cassandra.

Graph of various tracked Exceptions

The image above graphs all the exceptions that are produced from Cassandra. The two big lines are

Transport Exceptions (te) - meaning that Cassandra could not answer the request think of this as MAX Connection errors in mySQL.

Unavailable Exceptions (ue) - meaning that Cassandra could answer the request but the "storage engine" cannot do anything with it because its busy doing something like communicating with other nodes or maintenance like a node cleanup.

So how did I get the graph to drop to 0? After looking at the error logs, I saw that Cassandra was getting flooded with SYN Requests and the kernel thought that it was a SYN Flood and did this

possible SYN flooding on port 9160. Sending cookies.

To stop this the puppet profile was changed to have

sysctl -w net.ipv4.tcp_max_syn_backlog=4096
sysctl -w net.ipv4.tcp_syncookies=0

Next looking into the Cassandra log which I defined to exist in /var/log/cassandra/system.log

WARN [TCP Selector Manager] 2010-03-26 02:46:31,619 (line 53) Exception was generated at : 03/26/2010 02:
Too many open files Too many open files

Then noticed that
ulimit -n == 1024

thus I changed
/etc/security/limits.conf so that It's at a server setting by adding this:

* - nofile 8000

Now my Transport Exceptions and Unavailable Exceptions are gone and data is being written to it consistently.

There are many other ways of doing the same thing, I could have modified my init script or did some other stuff but I choose this way. Default Distros set kernel and limits fields too low: settings for desktop levels.


Jonathan Ellis said...

Are you pooling your Thrift connections client-side? It sounds like you're not. Connection-per-request is going to give you really bad performance.

Dathan Vance Pattishall said...

An F5 Loadbalancer is in front of it. I can turn on Persistent connections but the connect itself is less then 3 ms. Plus with a persistent connection I don't get an even distribution of requests among each node. I'll give it a try yet connect fetch close seems to work really well. With a 20ms over head of range reads I'm really not going to see a reduction in R(t).

Mark Robson said...

Presumably this is in a test environment, right?

How are you generating the test loads, can we see the load test client code or some ideas of how the client works?

Dathan Vance Pattishall said...

This is a production environment.
On every api call and page load the client does the following

1. Make a new Connection to the F5 loadbalancer

2. F5 loadbalancer distributes load to a least connected node

3. The server makes the call to its own data structure or to the server that has the hash

4. The server responds to the request

5. The client returns the data and closes the connection.

From connect to query takes 3 ms

From connect to data return takes around 100ms

Most of the time is spent on the read and sort.

To enable a lot of frequent short requests the article enables you to do that.