mySQL DBA, Architecture, Dev, Scale, HA, Code : INNODB Shared Locks, Exclusive Locks and INSERT INTO SELECT FROM

Wednesday, October 29, 2008

INNODB Shared Locks, Exclusive Locks and INSERT INTO SELECT FROM

I wrote an application that is able to send out 3-8 million messages an hour with only 10 CPU's. This application is a part of an Offline Task system that scales linearly.

How is this done, I'll go into detail hopefully at the mySQL conference if they accept my proposal.

The scope of this blog post is to go over building the "Task Queues". Currently I have 13 Queues, one queue for each Shard that I run. The data is federated by user or randomly federated with a GUID that lives as long as the job. A request came in to add 20 million jobs to the queue all at once. The problem is with this list, will I cause deadlocks in innodb as I add the jobs to the queue as one transaction? Can live traffic still write to this queue?

To verify that Deadlocks will not occur - having an understanding about how locks work in INNODB is key. I suggest reading this page.

To build the queue I dumped the data source by


SELECT identifier, 16 INTO OUTFILE "/data/mysql/BuildQueue.log" FROM SOURCE_TABLE WHERE CONDITION.

The isolation level is REPEATABLE-READ; I'm setting a shared lock to get the most current version of the data. Writes are not blocked.

Next:

I create a table on each shard where the queue is located.


CREATE TABLE IF NOT EXISTS OfflineTasksHold (
    object_id bigint(20) NOT NULL DEFAULT 0,
    object_type smallint unsigned NOT NULL DEFAULT 0,
    PRIMARY KEY(object_id,object_type)
) ENGINE=INNODB;

Then I issue a command on each Shard


LOAD DATA INFILE "/data/mysql/BuildQueue.log" INSERT INTO TABLE OfflineTasksHold;
START TRANSACTION;
INSERT INTO OfflineTasks (object_id, object_type) SELECT * FROM OfflineTasksHold;
COMMIT;

Each shard is getting around 4-5 million rows, while accepting real-time traffic of 20-60 tasks a second to the OfflineTasks table. The OfflineTasksHold table does not have any real-time requirements and is solely used to keep the queue in Primary Key order, plus there is a CHANCE that LOAD DATA could set an exclusive lock on the OfflineTasks table-shutting down adding data to the table by the live site. The INSERT sets an exclusive lock on the rows that are being added, so the Offline Task Sheppard - the process that pops tasks off the queue is blocked for a small period - which is acceptable. Why are they blocked? Well, the massive insert sets an Exclusive Lock. The Sheppard is trying to grab the rows that are locked waiting up to 50 seconds, until innodb_wait_timeout is reached. This condition is acceptable. All other inserts are able to go into the queue without a slow down.

In summary, the job queue is built and can be automated with confidence knowing that death to the various app will not occur. Processing slows down for a bit but speed right back up.

2 comments:

hồng anh said...: hi.... happy day !!!; Sun Nov 02, 03:50:00 AM
Ben Margolin said...: dathan, mail or call me...
-ben!; Fri Nov 21, 10:54:00 PM