Friday, September 14, 2007

The mySQL Optimizer and your applications

What is the mySQL Optimizer? Well it's the part of the mysql query engine that determines what index to use, based on given sql. It sucks in every version that I tested, which is

3.23
4.0
4.1
5.0
5.1


I find that in nearly every app that I've end up writing I need to create my own optimizer to determine what index to use.

Why?


Well, imagine this. Your table has 30 indexes and many of the prefixes of compound indexes are the same. The reason is because given a question you want to sort the data quickly in different views. The optimizer sucks at figuring out which index to use under these cases (and many others)

For example:

SELECT * FROM SomeTable WHERE owner_id = ? AND perms IN (0,1) AND karma IN (0,1) ORDER by date_create DESC, photo_id DESC limit 10


Now the table contains an index on

owner_id, perms, karma, date_create,photo_id
owner_id, date_create, photo_id
and 28 other index.



Which index should mysql pick?

Well since perms is a range and karma is a range then using the index (owner_id, perms, karma, date_create, photo_id) to avoid a filesort from an ORDER BY is not possible. The reason to avoid filesorts especially in cases of large amounts of data, is because a filesort is 60% of the query time. So, the course of action is to pick the index where a filesort is avoided, thus

owner_id, date_create, photo_id should be picked.


But in fact mysql doesn't pick the correct index, ever. Innodb especially from query to query execution of the same type query the index changes over and over due to how innodb determines cardinality.


Now, you may be thinking why not just run analyze table? Still mysql will pick the wrong index, and Analyze table for INNODB only dumps the innodb_buffer_pool which sucks.



How does one get around this?


Write your own optimizer! Here is some PHP snippits




function sometable_get_index($where, $sort) {
if(strpos($where, 'owner_id') !== FALSE) {
if (strpos($where, 'karma') !== FALSE) {
if (strpos($sort,'date_create') !== FALSE &&
strpos($where, 'IN') !== FALSE) {

return 'USE INDEX (owner_datecreate_id)';
}

if(strpos($where, 'perms') !== FALSE && !$sort) {

return 'USE INDEX (owner_perms_karma)';
}

}

MORE Parser logic....


}





Why use strpos? Well, it's faster then preg_match and strstr and I only care about the existence of SQL fields in the where and sort clause, not the contents of the rvalue of each column.

3 comments:

Anonymous said...

Dathan,

Indeed this is a common issue and I wrote about it a few times.

MySQL seems to try to limit number of rows it needs to scan in worse case scenatio using order by/limit optimization only when it is possible without more rows examined.

This is obviously serious issue and I'd like to see MySQL Optimizer guys to fix it soon :)

Anonymous said...

Not really on topic, but I'm trying to find some reviews of MySQL GUI admin interfaces (I know, probably sacrilege, but it would be nice). If anyone can point me to a good place I'd appreciate it.

Dathan Pattishall said...

phpmysqladmin is a standard among people. Personally I just use the mysql client at the command line. If I need more complex stuff I'll code a specific application.