<body><iframe src="http://www.blogger.com/navbar.g?targetBlogID=31421954&amp;blogName=mySQL+DBA&amp;publishMode=PUBLISH_MODE_BLOGSPOT&amp;navbarType=BLUE&amp;layoutType=CLASSIC&amp;homepageUrl=http%3A%2F%2Fmysqldba.blogspot.com%2F&amp;searchRoot=http%3A%2F%2Fmysqldba.blogspot.com%2Fsearch" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" height="30px" width="100%" id="navbar-iframe" title="Blogger Navigation and Search"></iframe> <div id="space-for-ie"></div>

mySQL DBA

Tuesday, May 20, 2008

How to cause deadlocks in INNODB and fix them

So imagine you have a table and for every row it contains an INT that represents a user, a medium int that represents an activity number, and a ENUM that represents state.



CREATE TABLE UserStack (
id int unsigned not null default 0,
activity medium int unsigned not null default 0,
state ENUM('ready','processing','processed', 'error'),
PRIMARY KEY(id),
INDEX (state, activity)
) ENGINE = INNODB;




Now you have an application that is spread across many servers which pops a set of users off the stack and sets them to a processing state.


START TRANSACTION

SELECT * FROM UserStack WHERE state='ready' ORDER BY activity DESC LIMIT 10 FOR UPDATE

foreach id

Mark them Processing





If multiple threads do this at the same time; the 1st thread will get the 10 ids while 99% of the others will fail with "Deadlock Detected try restarting transaction"

The reason: from innodb's perspective many different clients are asking to perform actions on the same data all at the same time-thus a deadlock is detected. To exacerbate the problem the data is ordered differently from the cluster index-so in essence the entire table is scanned for this example. The table has 30 million rows.


I've tested transaction isolation levels:

READ COMMITED
REPEATABLE READ
SERIALIZED

How to get around this:

Lets add a column to make the rows unique for the calling process; lets add pid (add server int unsigned if you want to run the process from many servers).

The access pattern for the table is now going to involve pid and state. The column pid nor the combo with state is unique, so the user id which defines the user.

ALTER TABLE UserStack DROP PRIMARY KEY, ADD pid int unsigned NOT NULL DEFAULT 0, ADD PRIMARY KEY(state,pid,id), DROP INDEX state;

Then change SELECT ... FOR UPDATE to an UPDATE statement. UPDATE operations for some reason are better at concurrency then SELECT FOR UPDATE. The update scans the PRIMARY index and updates the selected amount of rows defined by the LIMIT.

UPDATE UserStack SET pid=getmypid(), state='processing' WHERE pid = 0 AND state='ready' LIMIT 10;

SELECT * FROM UserStack WHERE pid = getmypid() AND state='processing';

foreach user
process
mark as completed

Rinse and repeat.

The desired affect is complete. Each thread can grab its own work and each thread is guaranteed a unique user or block of users to process. The act of marking rows (marking their territory) and grabbing the marked rows takes a fraction of seconds as it should.


A good write ups on deadlocks

a-little-known-way-to-cause-a-database-deadlock

StumbleUpon Digg!

Tuesday, May 13, 2008

Net Settings mySQL & Memcache

Ever see this
TCP: drop open request from 10.209.23.142/43407


Well lets start with a more specific example:
Memcache is tightly coupled in your code: Every request caches the response from the database so a lot of quick calls to memcache is made. Then you start adding full HTML to memcache instead of just caching the raw data; so now your load pattern is bigger blobs of data still at a high request rate.

Now suddenly the memcache port hangs-you verify this by ssh to the box and then telneting the the memcache box port 11211 and see that ssh works (port 22) yet 11211 does not. As a result all your front ends fall over because they are hanging on the memcache port.

THIS IS NOT A MEMCACHE PROBLEM. Its a kernel problem. Default installs of Linux set the TCP window buffer size to a desktop setting and not a server setting.

So I run this script.


#!/bin/bash

sysctl -w net/core/rmem_max=8738000
sysctl -w net/core/wmem_max=6553600

sysctl -w net/ipv4/tcp_rmem="8192 873800 8738000"
sysctl -w net/ipv4/tcp_wmem="4096 655360 6553600"
sysctl -w vm/min_free_kbytes=65536



I found this out by going here

This is the first kernel setting that I have seen make a real big difference.


So my list of changes so far to the kernel default settings are (getting lazy in detail)

vm.swappiness=0
run the deadline scheduler

On the filesystem side

mount the datadir noatime
use O_DIRECT
if you have cache on a hardware raid card set the cache for writes (make sure you have a BBC)
use Raid-10 or if you have the money + can take a hit on I/O RAID-6
stripe size 128-256K

I have some other tweaks that I'm forgetting but when I find them I'll post them.

StumbleUpon Digg!

Thursday, May 08, 2008

UAE Broken Proxys - how to work around them

In the UAE there is a goverment enforcement of all traffic to go through goverment proxy servers. All requests are proxied and cached. What does this mean in the mysql world? Imagine you have a web application that records comments to a media object. Then a single person from the UAE makes a comment to that media object. The UAE Cache Proxy caches the HTTP-POST. Now every person in the country that visits your domain causes a HTTP-POST to that same media object. From a outside view it looks like content is being added to the system and the numbers are good. But ALL THE CONTENT is the SAME which is not valuable to the media owner nor the DBA who has to manage that data due to the BROKEN-CACHE-PROXY.

So how do you bust the BROKEN-CACHE-PROXY?

A simple method is to look at the contents of the last few posts and see if the same data is being added prior to insert-but this can be expensive at high scale-the scale that I deal with (billions of transactions per day).

A simple approach is to just add a hidden variable to the POST with your webservers time. If you webserver time on the post >= hidden variable time + 120 seconds then you know its a bad post.

StumbleUpon Digg!

Monday, May 05, 2008

I just pre-ordered my High Performance MySQL: Optimization, Backups, Replication, and More

Did you pre-order yours?

Pre-Order yours today

Peter and Byron are really smart guys and very methodical in their tests to make sure the conclusions produced are rock solid. I don't know whats in the book, but if these guys made it, its going to be good-that's how much faith I have.

StumbleUpon Digg!

Thursday, May 01, 2008

Linux 64-bit, MySQL, Swap and Memory

The VM for Linux prefers system cache over application memory. What does this mean? The best way I can explain is by example.

Imagine you have 32 GB of RAM
MySQL is set to take 20 GB of RAM for a process based buffer and up to 6M for the various thread buffers.

Over a period of time the box swaps. The only thing that is running is mysql and its memory size is around 21GB for resident memory. Why does swap grow when there is plenty of memory? The reason is when a memory alloc is needed (thread based buffer is tickled) the VM will choose to use swap over allocating from the system cache, when there is not enough free memory.

DO NOT TURN OFF SWAP to prevent this. Your box will crawl, kswapd will chew up a lot of the processor, Linux needs swap enabled, lets just hope its not used.

So how do you stop Nagios pages because of swap usage? Well if you have a few choices.

reboot the box

or

stop mysql && swapoff -a;swapon -a;

or just

swapoff -a;swapon -a;

Doing the latter command is rather scary and fun at the same time. Because you can either crash mysql or not. I just did the swap* commands live, I was very certain nothing was using swap and it worked. YAY no more pages and I didn't have to shut down the service!

Labels: , , ,

StumbleUpon Digg!

Wednesday, April 23, 2008

Avoid storing Markup (HTML) in the database

I see this to often; Storing HTML in the database. Then UI wants to change the HTML, but the data grew to 100 GB, so the really only feasible way to change the HTML is via a post process after the database fetch. This post process produces a huge list of preg_replace statements to rebuild the HTML on display. This consumes a lot of memory over time and slows down the APP, plus its time consuming to debug.


Store URLS if the apps need to. Or better yet build the schema to store the bare minimum the app needs to generate the HTML.

Here is a compelling reason why not to store HTML:

The data needed to generate the HTML is 10 bytes, but the TEXT field consumes 1024 bytes, all of which is the same text.

Thus for an app that should cost very little to maintain now costs 10 times as much.

If your building a search engine strip out the HTML and store the text. If its a quick app, take this saying into account

The is nothing more Permanent then a temporary solution


A exception that I don't mind: php serialize - this is markup that (rarely) does not change.

StumbleUpon Digg!

Tuesday, April 22, 2008

mySQL uc2008 presentation

Labels: , , ,

StumbleUpon Digg!