tag:blogger.com,1999:blog-31421954.post2104805576656335650..comments2023-10-30T08:23:12.960-07:00Comments on mySQL DBA, Architecture, Dev, Scale, HA, Code : How to pick indexes for order by and group by queriesDathan Pattishallhttp://www.blogger.com/profile/00356367514107959723noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-31421954.post-42926045024305567422008-07-07T18:35:00.000-07:002008-07-07T18:35:00.000-07:00But what if the query was:SELECT c1, c2, c3, SUM(c...<I>But what if the query was:<BR/><BR/>SELECT c1, c2, c3, SUM(c4) FROM T WHERE c1 = ? GROUP BY c2 ORDER BY c3 DESC LIMIT 10;</I><BR/><BR/>That would produce a temp table and a filesort.<BR/><BR/>The reason the index is c1,c2,c3,c4<BR/>So where c1=? and the group by of c2 would use that index, but to order the data properly you would need to do<BR/><BR/>ORDER BY c1 DESC, c2 DESC, c3 DESC<BR/><BR/>to get rid of the temp table or filesort.<BR/><BR/>I believe in 5.1 that this case is being worked on in the optimizer level.Dathan Pattishallhttps://www.blogger.com/profile/00356367514107959723noreply@blogger.comtag:blogger.com,1999:blog-31421954.post-21234618296096256702008-07-06T15:05:00.000-07:002008-07-06T15:05:00.000-07:00Absolute best info I could find on picking indexes...Absolute best info I could find on picking indexes for group bys, but I'm still having a bit of trouble applying it.<BR/><BR/>But what if the query was:<BR/><BR/>SELECT c1, c2, c3, SUM(c4) FROM T WHERE c1 = ? GROUP BY c2 ORDER BY c3 DESC LIMIT 10;kellanhttps://www.blogger.com/profile/06154462724775873752noreply@blogger.comtag:blogger.com,1999:blog-31421954.post-26235947210601520692008-06-17T12:46:00.000-07:002008-06-17T12:46:00.000-07:00hehe, you see it's not easy to work out a good ind...hehe, you see it's not easy to work out a good index even for this relatively simple case. If you're interested in automated index verification/generation, I recommend you check out an open source tool I'm working on: http://ritmark.com<BR/><BR/>BR,<BR/>VladimirAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-31421954.post-69614938058267468372008-06-16T21:19:00.000-07:002008-06-16T21:19:00.000-07:00No your write, I have a typo fixing.No your write, I have a typo fixing.Dathan Pattishallhttps://www.blogger.com/profile/00356367514107959723noreply@blogger.comtag:blogger.com,1999:blog-31421954.post-38247419067606281012008-06-13T10:30:00.000-07:002008-06-13T10:30:00.000-07:00I may be missing something here, but the "before" ...I may be missing something here, but the "before" and "after" queries don't look equivalent to me. Try the following dataset:<BR/><BR/>http://legionsgs.com/xtra/garth/TestData.csv<BR/><BR/>As written, the queries don't return the same sets of records. If you remove the LIMIT clauses, they return the same records, but in a different order.<BR/><BR/>However, if you really don't care about the order, just remove the ORDER BY clause from the original query. If you do that, it already runs optimally under IDX(c1, c2, c3, c4); no need to do any reorganizing.Anonymousnoreply@blogger.com