Monday, July 07, 2008

Group by ORDER by Optimization part II

In my previous blog post I talk about GROUP BY and ORDER BY optimizations. A member asked a great question that I'd like to share with everyone.


But what if the query was:

SELECT c1, c2, c3, SUM(c4) FROM T WHERE c1 = ? GROUP BY c2 ORDER BY c3 DESC LIMIT 10;




That query would produce a temp table and a filesort.
explain SELECT c1, c2, c3, SUM(c4) FROM column_test WHERE c1 = 1 GROUP BY c2 ORDER BY c3 DESC LIMIT 10\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: column_test
type: ref
possible_keys: c1
key: c1
key_len: 5
ref: const
rows: 1
Extra: Using where; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)




The reason the index is c1,c2,c3,c4

So where c1=? and the group by of c2 would use that index, but to order the data properly you would need to do

WHERE c1 = ? GROUP BY c2, c3 ORDER BY c1 DESC, c2 DESC, c3 DESC

explain SELECT c1, c2, c3, SUM(c4) FROM column_test WHERE c1 = 1 GROUP BY c2,c3 ORDER BY c1, c2\G

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: column_test
type: ref
possible_keys: c1
key: c1
key_len: 5
ref: const
rows: 1
Extra: Using where; Using index
1 row in set (0.00 sec)

to get rid of the temp table or filesort. Filesorts and temp tables takes about 50% of the query time-so avoid these when the query is requested at a huge rate.

Also note that to get rid of the temporary table and filesort, the query changed and does not answer your question without post processing the data in PHP or some other layer.




The reason: the data is ordered by the index. For innodb the entire table is ordered by the PRIMARY key, each index has a reference to. The mysql implementation of group by does not know how to traverse and sort the data in the 1st pass of part of the key. The optimizer needs a lot of work. So, what is done from the mysql level is to automatically create a temp table and sort on that instead of using the index which it already traversed.

I believe in 5.1 that this case is being worked on in the optimizer level to get rid of this common slowdown.

2 comments:

Anonymous said...

A片,A片,成人網站,成人漫畫,色情,情色網,情色,AV,AV女優,成人影城,成人,色情A片,日本AV,免費成人影片,成人影片,SEX,免費A片,A片下載,免費A片下載,做愛,情色A片,色情影片,H漫,A漫,18成人

a片,色情影片,情色電影,a片,色情,情色網,情色,av,av女優,成人影城,成人,色情a片,日本av,免費成人影片,成人影片,情色a片,sex,免費a片,a片下載,免費a片下載

情趣用品,情趣用品,情趣,情趣,情趣用品,情趣用品,情趣,情趣,情趣用品,情趣用品,情趣,情趣

A片,A片,A片下載,做愛,成人電影,.18成人,日本A片,情色小說,情色電影,成人影城,自拍,情色論壇,成人論壇,情色貼圖,情色,免費A片,成人,成人網站,成人圖片,AV女優,成人光碟,色情,色情影片,免費A片下載,SEX,AV,色情網站,本土自拍,性愛,成人影片,情色文學,成人文章,成人圖片區,成人貼圖

情色文學,色情小說,色情,寄情築園小遊戲,AIO交友愛情館,情色電影,一葉情貼圖片區,色情遊戲

言情小說,情色論壇,色情網站,微風成人,成人電影,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,微風成人區,成人網站,免費影片,色情影片,自拍,hilive,做愛,微風成人,微風論壇,AIO

womagrid said...

Suppose you have a lot of customer data that should always be sorted by time and id (primary key). A compound index on (time, id) would seem to be a good idea.

This works well when looking at data for all customers but half the time you're only interested in data pertaining to one customer. How do you then avoid the need to create an almost duplicate compound key (customer, time, id) to handle the addition of "where customer = ..." to the query?