Tuesday, May 19, 2009

Multi Direction Sorts and avoiding a file sort

There are two PRIMARY directions to sort data in SQL: Ascending (ASC) and Descending DESC.
When these two sort definitions are put together in a single statement a filesort is produced.

Why do we want to avoid filesorts?

Filesorts are bad. 1st they tickle a thread based buffer called sort_buffer_size. Additionally filesorts reads the data twice, unless max_length_for_sort_data limit is reached and as a result the Filesort runs slower to reduce disk I/O. If you want filesorts to run faster at the expense of the disk increase the default max_length_for_sort_data. You can read the filesort algorithm here.

So, here is an example

CREATE TABLE `ABCD` (
`A` int(10) unsigned NOT NULL default '0',
`B` int(10) unsigned NOT NULL default '0',
`C` int(10) unsigned NOT NULL default '0',
`D` int(10) unsigned NOT NULL default '0',
PRIMARY KEY (`a`,`b`,`c`,`d`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1



mysql> explain SELECT * FROM ABCD WHERE a=1 AND b=1 ORDER BY c DESC, d ASC\G

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ABCD
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 8
ref: const,const
rows: 2
Extra: Using where; Using index; Using filesort
1 row in set (0.00 sec)



Notice the filesort? So how does one get around this filesort?

Well

Let's define some roles for columns C and D. C is the parent while D is the child.

  • We want all the latest parents (C)
  • We want all the oldest children (D)

    We require pagination of all the PARENTS (show 10 parents per page) so Queries like this is PRODUCED

    SELECT * FROM ABCD WHERE A=? AND B=? ORDER BY C DESC
    explain SELECT * FROM ABCD WHERE a=1 AND b=1 ORDER BY c DESC LIMIT 10\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: ABCD
    type: ref
    possible_keys: PRIMARY
    key: PRIMARY
    key_len: 8
    ref: const,const
    rows: 2
    Extra: Using where; Using index
    1 row in set (0.00 sec)




    Now

    FOREACH($C_parent as $i => $c_id) {

    $C_parent[$i] = SELECT SQL_CALC_FOUND_ROWS * FROM ABCD WHERE A=? AND B=? AND C=$c_id ORDER BY D ASC LIMIT 1;


    }

    So, we changed 1 query into 11 queries (10 parents per page) to make the page load happen faster, by getting rid of the filesort.

    What 11 queries is faster then 1? Yes, for this case it is. The reason is because filesorts are SLOOOOW, they chew up a lot of limited resources and they should be avoided. I've see filesorts take close to 50-60% of the query time.
  • 2 comments:

    Steven Roussey said...

    Oh, for pete's sake, make a column E that is the largest # for C's type minus the value of C and have the index (or create an additional index) on that (a,b,e,d).

    Dathan Pattishall said...

    Yea that would work for ints, but utf8 strings not so much.