Wednesday, January 24, 2007

x86_64 + INNODB + 4.1 + throwing a server in to live traffic

As many of you know 4.1 does not have the thread bug fix as mentioned in this post. So, throwing a new mySQL server into production can cause a spike of threads that exposes this bug. Why? Well, for the most part INNODB has not filled its buffer pool for the most part.


For example, assume you have a box with 16 GB of ram, running 64-bit Linux. Let's look at MySQL when it first starts.

Here is some sample output from top


top - 18:46:22 up 400 days, 22:30, 3 users, load average: 0.65, 1.08, 0.73
Tasks: 132 total, 2 running, 130 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.7% us, 0.5% sy, 0.0% ni, 92.2% id, 6.5% wa, 0.0% hi, 0.2% si
Mem: 16253552k total, 3406008k used, 12847544k free, 79364k buffers
Swap: 8388600k total, 160k used, 8388440k free, 1708900k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14567 mysql 16 0 14.1g 1.4g 3324 S 1.3 8.9 0:09.02 mysqld
14562 mysql 16 0 14.1g 1.4g 3324 S 0.3 8.9 0:00.01 mysqld
14566 mysql 15 0 14.1g 1.4g 3324 S 0.3 8.9 0:02.26 mysqld



Even though innodb_buffer_pool_size=13G, the mysql process (using LINUX Threads) is not 14G it's 1.4g. RES in top is the non-swapped physical memory a task is using. This is very important.

Now, if I put this server into production now, INNODB would have to do extra work to figure out what tables and such need to be in the buffer pool as well as serve the onslaught of traffic.

Assume that we just put the server into production at this level. Your request rate of 100 connections per second require 1000 selects a second. Within the 1st second of deploy 100 threads will all request data unbuffered. Causing a huge slowdown, that is very hard to recover from.


Now a way around this:

mysql -uuser -p DB -e'SELECT * FROM Table WHERE id > 10 million - MAX(id)' > /dev/null

Now mysql RES is taking 4.0g as the process runs


14556 mysql 16 0 14.1g 4.0g 3392 S 0.0 25.8 0:06.19 mysqld
14557 mysql 16 0 14.1g 4.0g 3392 S 0.0 25.8 0:00.00 mysqld
14558 mysql 20 0 14.1g 4.0g 3392 S 0.0 25.8 0:00.00 mysqld



But, still this is not good, you want to fill the buffer pool with data that will likely be accessed for your application, and get RES to the size of your buffer pull. So, keep doing this for your most highly accessed tabled ORDER BY the column that you primarily sort by.

Another Query


top - 20:33:35 up 401 days, 3 users, load average: 1.64, 1.94, 1.12
Tasks: 140 total, 2 running, 137 sleeping, 0 stopped, 1 zombie
Cpu(s): 7.5% us, 18.4% sy, 0.3% ni, 52.6% id, 10.8% wa, 1.0% hi, 9.5% si
Mem: 16253552k total, 16226832k used, 26720k free, 13768k buffers
Swap: 8388600k total, 160k used, 8388440k free, 9153788k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14556 mysql 16 0 14.1g 6.6g 3396 S 0.0 42.4 0:06.19 mysqld
14557 mysql 16 0 14.1g 6.6g 3396 S 0.0 42.4 0:00.00 mysqld
14558 mysql 20 0 14.1g 6.6g 3396 S 0.0 42.4 0:00.00 mysqld
14559 mysql 16 0 14.1g 6.6g 3396 S 0.0 42.4 0:00.05 mysqld


To finally


top - 20:33:35 up 401 days, 3 users, load average: 1.65, 2.12, 1.38
Tasks: 195 total, 4 running, 191 sleeping, 0 stopped, 0 zombie
Cpu(s): 43.1% us, 7.6% sy, 0.0% ni, 17.0% id, 30.5% wa, 0.2% hi, 1.7% si
Mem: 16359928k total, 16346856k used, 13072k free, 45176k buffers
Swap: 8296104k total, 132392k used, 8163712k free, 547404k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26259 mysql 15 0 14.9g 14g 3752 S 8.6 95.2 0:00.35 mysqld
26260 mysql 16 0 14.9g 14g 3752 S 0.0 95.2 0:00.17 mysqld
26261 mysql 16 0 14.9g 14g 3752 S 2.0 95.2 0:00.19 mysqld

3 comments:

Tom said...

How about merging it into production by initially setting the max_connections very low and gradually increasing the value?

Dathan said...

Yes that works great. I use that all the time as well, this is for throwing the server in for 100% capacity at start, ie when the app cannot handle failed connections.

Anonymous said...

Does it really work?

mysql> SELECT * FROM thread WHERE threadid > 10 million - MAX(threadid);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'million - MAX(threadid)' at line 1

mysql> SELECT * FROM thread WHERE threadid > 10000000 - MAX(threadid);
ERROR 1111 (HY000): Invalid use of group function
mysql>