Thursday, March 12, 2009

Walking an INNODB table Fast

Walking a table means, traversing each row, commonly used in building queues, fixing data, or dumping a table. I've recently ran into a problem-caused by an assumption, where walking a table was taking way to long using the method

$pos = 0;
do {

$result = SELECT col FROM TABLE LIMIT $pos, 1000;
$pos += 1000;
} while ($result);

The assumption was since INNODB uses a cluster index, this would traverse the table using the PRIMARY key. This is not the case, its not a problem in INNODB but a bad assumption, that I fell victim to. A table scan to each $pos occurs producing a Big-O of N^2. So, when the query:

SELECT col FROM TABLE LIMIT 1000000, 1000 is executed mySQL will scan all the rows up to row position 1001000 and for each subsequent iteration.

This is SLOOOOW. IMHO since the table is sorted by the primary key, mySQL should optimize this case - but it does not and will not. So, to walk an INNODB table fast, and keep liner time or a Big-O of N an alternative is

$last_id = 0
do {

$result = SELECT col FROM TABLE USE INDEX(PRIMARY) WHERE pkey_part > $last_id LIMIT 1000
$last_id = $result[count($result) - 1]->pkey_part


This dumps a table very fast, almost as fast as doing a count(*) on the PRIMARY KEY.

Another method is to


but the data is local to the database - thus the need for the application to grab data. Another draw back of this method is that the dump produces more disk IO then walking a table off of a key, slowing down access to this table.

In conclusion, even if the storage engine keeps the table order consistent like INNODB does, do not assume that LIMIT 100000, 1000 is equivalent to a file seek of position 100000, without telling the Optimizer to use an index.


arjenAU said...
This comment has been removed by the author.
Nils said...

Have you tried the HANDLER statement?

Dathan said...

No because the HANDLER method doesn't provide a consistent view of the database. I.E. in INNODB you can get dirty reads.

arjenAU said...

The two approaches are now not similar at all.
For the first chunk, it doesn't matter since it needs all those rows anyway.
For the rest, could you try the "WHERE id > lastid" construct without "USE/FORCE INDEX(PRIMARY KEY)" and see how that goes? I expect it'll go for a range scan on the primary key, which is exactly what you want.

Dathan said...

Sorry I am not following you,
iterating through a table

is the same as

SELECT col FROM TBL WHERE pkey_part > 0 LIMIT 1000;

except that pkey_part forces an index scan while the 1st option forces a full table scan. This is exposed when your application needs to hit in the million range.

SELECT col FROM TBL LIMIT 1000000, 1000;

Since INNODB uses a clustered index, the data is IN PKEY order.

arjenAU said...

IIRC... the optimiser does not take the LIMIT offset into account when deciding on index use for retrieval. It only optimises the cutoff point (i.e. stops retrieving rows if there's no group/order by and there's a limit).
So a LIMIT N,M is not the same as a WHERE clause; from the optimiser's perspective, look at your query without the LIMIT; then you'd be looking at the entire table, which makes tablescan the most efficient. It's somewhat similar to using HAVING rather than WHERE, in this context.

And it's not an index scan, it's a range scan. It finds the starting point (PK lookup in this case) and only walks the index from there.

What I'm saying is, that does what you want, without USE/FORCE INDEX.

Dathan said...

Ah I see, yes I ment to say range scan. The reason why I use force/use is because the optimizer, especially with INNODB will sometimes pick an index that I did not intend to use.

arjenAU said...

USE/FORCE should not be necessary in this case, and it's a bad habit to get into using these constructs as they prevent the optimiser from making choices; your dataset and query profile will change over time.

I tend to use USE/FORCE only to track down a problem, and then rephrase the query so it's no longer required.

Dathan said...

Dathan >>> The mySQL optimizer :)

In reality the optimizer causes more problems for me then it fixes. In this case its USE INDEX is not needed, yet in others the optimizer will choose 7 times out of 10 to pick an index that causes a filesort or temp table. Thus, I usually build a client side optimizer that is much better then mysql's

Domas Mituzas said...

excuse me, what is wrong with pure 'SELECT * FROM table'? :) if you're using 'unbuffered' mode, there would still be quite a few buffers to rely on (like, socket), ...

Mark Callaghan said...

What is wrong with InnoDB and MySQL? A b-tree is an index from keys to values. For your sample code to be efficient it would have to be an index from row number to values.

The sample code you have listed is broken. This scans O(N*N) rows to fetch O(N) rows.

You should update the blog post to avoid confusing others. It reads as though your team has discovered something broken in MySQL.

Dathan said...

Your right Mark, the goal was not to say that there is a problem with mysql but with the assumption that walking a table without an index will default to the primary key, since INNODB sorts the table by the PRIMARY key or insert order (uses a row-id). I've seen this method in a few open source programs, and thought this myself until it was actually investigated.

I've changed the post to make sure to highlight the bad assumption.