Tuesday, May 20, 2008

How to cause deadlocks in INNODB and fix them

So imagine you have a table and for every row it contains an INT that represents a user, a medium int that represents an activity number, and a ENUM that represents state.



CREATE TABLE UserStack (
id int unsigned not null default 0,
activity medium int unsigned not null default 0,
state ENUM('ready','processing','processed', 'error'),
PRIMARY KEY(id),
INDEX (state, activity)
) ENGINE = INNODB;




Now you have an application that is spread across many servers which pops a set of users off the stack and sets them to a processing state.


START TRANSACTION

SELECT * FROM UserStack WHERE state='ready' ORDER BY activity DESC LIMIT 10 FOR UPDATE

foreach id

Mark them Processing





If multiple threads do this at the same time; the 1st thread will get the 10 ids while 99% of the others will fail with "Deadlock Detected try restarting transaction"

The reason: from innodb's perspective many different clients are asking to perform actions on the same data all at the same time-thus a deadlock is detected. To exacerbate the problem the data is ordered differently from the cluster index-so in essence the entire table is scanned for this example. The table has 30 million rows.


I've tested transaction isolation levels:

READ COMMITED
REPEATABLE READ
SERIALIZED

How to get around this:

Lets add a column to make the rows unique for the calling process; lets add pid (add server int unsigned if you want to run the process from many servers).

The access pattern for the table is now going to involve pid and state. The column pid nor the combo with state is unique, so the user id which defines the user.

ALTER TABLE UserStack DROP PRIMARY KEY, ADD pid int unsigned NOT NULL DEFAULT 0, ADD PRIMARY KEY(state,pid,id), DROP INDEX state;

Then change SELECT ... FOR UPDATE to an UPDATE statement. UPDATE operations for some reason are better at concurrency then SELECT FOR UPDATE. The update scans the PRIMARY index and updates the selected amount of rows defined by the LIMIT.

UPDATE UserStack SET pid=getmypid(), state='processing' WHERE pid = 0 AND state='ready' LIMIT 10;

SELECT * FROM UserStack WHERE pid = getmypid() AND state='processing';

foreach user
process
mark as completed

Rinse and repeat.

The desired affect is complete. Each thread can grab its own work and each thread is guaranteed a unique user or block of users to process. The act of marking rows (marking their territory) and grabbing the marked rows takes a fraction of seconds as it should.


A good write ups on deadlocks

a-little-known-way-to-cause-a-database-deadlock

1 comment:

Anonymous said...

Seriously I don't understand it why it performs better.
Shouldn't SELECT ... FOR UPDATE not prevent the deadlock in the first place by locking the wanted rows without blocking concurrent similar queries?