TCP: drop open request from 10.209.23.142/43407
Well lets start with a more specific example:
Memcache is tightly coupled in your code: Every request caches the response from the database so a lot of quick calls to memcache is made. Then you start adding full HTML to memcache instead of just caching the raw data; so now your load pattern is bigger blobs of data still at a high request rate.
Now suddenly the memcache port hangs-you verify this by ssh to the box and then telneting the the memcache box port 11211 and see that ssh works (port 22) yet 11211 does not. As a result all your front ends fall over because they are hanging on the memcache port.
THIS IS NOT A MEMCACHE PROBLEM. Its a kernel problem. Default installs of Linux set the TCP window buffer size to a desktop setting and not a server setting.
So I run this script.
sysctl -w net/core/rmem_max=8738000
sysctl -w net/core/wmem_max=6553600
sysctl -w net/ipv4/tcp_rmem="8192 873800 8738000"
sysctl -w net/ipv4/tcp_wmem="4096 655360 6553600"
sysctl -w vm/min_free_kbytes=65536
I found this out by going here
This is the first kernel setting that I have seen make a real big difference.
So my list of changes so far to the kernel default settings are (getting lazy in detail)
run the deadline scheduler
On the filesystem side
mount the datadir noatime
if you have cache on a hardware raid card set the cache for writes (make sure you have a BBC)
use Raid-10 or if you have the money + can take a hit on I/O RAID-6
stripe size 128-256K
I have some other tweaks that I'm forgetting but when I find them I'll post them.