Tuesday, May 13, 2008

Net Settings mySQL & Memcache

Ever see this
TCP: drop open request from 10.209.23.142/43407


Well lets start with a more specific example:
Memcache is tightly coupled in your code: Every request caches the response from the database so a lot of quick calls to memcache is made. Then you start adding full HTML to memcache instead of just caching the raw data; so now your load pattern is bigger blobs of data still at a high request rate.

Now suddenly the memcache port hangs-you verify this by ssh to the box and then telneting the the memcache box port 11211 and see that ssh works (port 22) yet 11211 does not. As a result all your front ends fall over because they are hanging on the memcache port.

THIS IS NOT A MEMCACHE PROBLEM. Its a kernel problem. Default installs of Linux set the TCP window buffer size to a desktop setting and not a server setting.

So I run this script.


#!/bin/bash

sysctl -w net/core/rmem_max=8738000
sysctl -w net/core/wmem_max=6553600

sysctl -w net/ipv4/tcp_rmem="8192 873800 8738000"
sysctl -w net/ipv4/tcp_wmem="4096 655360 6553600"
sysctl -w vm/min_free_kbytes=65536



I found this out by going here

This is the first kernel setting that I have seen make a real big difference.


So my list of changes so far to the kernel default settings are (getting lazy in detail)

vm.swappiness=0
run the deadline scheduler

On the filesystem side

mount the datadir noatime
use O_DIRECT
if you have cache on a hardware raid card set the cache for writes (make sure you have a BBC)
use Raid-10 or if you have the money + can take a hit on I/O RAID-6
stripe size 128-256K

I have some other tweaks that I'm forgetting but when I find them I'll post them.

4 comments:

Steffen Weber said...

An alternative way to solve the memcached problem should be to use UNIX domain sockets instead of TCP/IP.

For example in PHP you can use:

$memcache->connect('unix:///var/run/memcached/memcached.sock', 0)

Unknown said...

For an added bonus, put your datadir on an unjournalled FS. Since you already effectively have a journal in the innodb TX log, you dont /really/ need another FS one.

If you do have an FS journal, you can rebuild the fs to have a much bigger journal and flush it less often.

Dathan Pattishall said...

I always wondered about that. I've been very cautious to run EXT2 (no FS journal), yet doing this does make sense.

I'm glad to see others running their systems like this.

Anonymous said...

The one catch with your setup I believe is that there's a known problem with swapiness=0. If you have a lot of memory (e.g. 16 GB), the memory will start swapping even when it's only partially used. swapiness=10 is a better option