Monday, August 07, 2006

Indexes, the optimizer, and doing efficent selects

Basics

Btree-Indexes must be less then 1024 bytes, so if you have a utf8 column that is 255 characters mySQL assumes 728 bytes, if you combine that with other varchar(255) utf8 columns you'll get an error.


Compound, composite indexes:

This term is used allot. It basically means 1 index across multiple columns.

For instance you have a table with

A tinyint
B tinyint
C tinyint
D tinyint

and a index on A,B,C taking 3 bytes.

following the principles of left-most-prefix key lookups (assume AND between columns)

A,B,C is an index lookup using 3 bytes
A,B order by C is a index Lookup taking 3 bytes 1 is used for a efficent sort.
A,B is an index lookup using 2 bytes
A is an index lookup using 1 byte

A, C is an index lookup using 1 byte. Notice that A,C does not use 2 bytes of the index making the lookup scan the result set of A, since A,C doesn't follow left-most in the compound index C comes after B ;))


B,C is not a index lookup causing a table scan.

A order by D is a index lookup using 1 byte with a filesort sorting the result set after the 2nd pass of finding the results.


Above we assumed that all the index lookups where based off of an 'AND' Clause. OR sucks in mySQL

A OR B uses a 1 byte index lookup scaning all of B making the query very slow.

How do we get around this?

USE UNION

SELECT * FROM ABCD_TABLE WHERE A=1 UNION SELECT * FROM ABCD_TABLE WHERE B=1

assuming that another index exists on B in the leftmost prefix of the compound index.

This is very fast.


More to come

3 comments:

Anonymous said...

You should look here http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html

Anonymous said...

I found this out last week - I have a lookup that involves a three way OR - each element of the or is a 2-element and [IE (A & B) OR (C & D) OR (E & F) ]

B,D and F are different lookups to the same table - call it 'halt code' and A, C, and E are each different. the Optimizer couldn't figure out how to use the indexes that way - split each condition up into a chunk of a union and speed improved from 35s/row lookup to 5s/row lookup on a 1.6m row dataset changed the table from myISAM to innoDB (I only have 750GB of harddrive to spare for the bigger indexes :P) and speed improved from 5s/row to .5s/row

Dathan Pattishall said...

I should of really qualify this, that this is in the context of 4.X series. What I use in production. So now you know :)