Wednesday, September 06, 2006

4.1 Optimizer and Joins

The Optimizer in 4.1 is weird. When doing joins, I expect it to figure out how to pick which tables to lookup 1st and compare in a correct manor. My expectations is a bit to much.


For instance say you have table:

  • A with 1 million rows

  • B with 10 million rows

  • C with 100 million rows



So, doing a small range on table A and taking these results to filter out with the other tables I expect the join order to be

A, B, C

Yet, the mySQL optimizer in many cases will join the table in the order of

B, A, C.

This is wrong. I know that the range generated from A is smaller then the range generated from B.

To get around this I use STRAIGHT_JOIN in a global context


SELECT STRAIGHT_JOIN SQL_CALC_FOUND_ROWS A.*, B.*, C.* FROM A,B,C WHERE A.owner_id = 123 AND A.id=B.id AND C.range_id=3 AND C.a_id=B.id ORDER BY A.key_part_2 DESC LIMIT 0,100


How do I know this is the correct order?

Well, order is based on the smallest number of results that will return of the elements involved.

A by itself will return 10 rows
B by itself will return 100 rows
C by itself will return 1000 rows

So A's 10 rows reduces B to 10, then C will reduce A,B to 5 since out of A,B set only 5 rows intersect with the entire C set.


5.0 fix some major deficiencies with the optimizer, but it's far from perfect. Personally, I think its just marginally better then 4.1, although I have not played with it as much as 4.1's.

What would be cool is for mySQL to build a Machine Learning algorithm to make the optimizer learn this stuff over a period of time. Maybe this could be a new type of index that could be made to dynamically act like a BTREE for any desired question. Decision Trees may hold the answer, and new ways of keeping data organized. I think this would be a cool feature for mySQL, which they can use to get customers to pay a subscription fee for.

13 comments:

Nathan Ostgard said...

You're right about 5.0 not being much better in this regard. It often gives me this problem with complex joins. I'm just glad the STRAIGHT_JOIN directive gives us a tool to fix it until they figure out a way to make the optimizer smarter.

Nathan Ostgard said...

Would there ever be a case where it would be more intelligent to read a marginally larger result set first (e.g. 100 rows vs 10), if it meant that MySQL could read the rows consecutively, instead of having to jump around to match rows within a large table? Would it be faster to do the seeking around within the small table?

Sheeri said...

What are the indexes on A, B and C?

What's the difference if you use proper syntax, and get rid of your comma joins, using the join condition instead of the where clause?

SELECT SQL_CALC_FOUND_ROWS
A.*, B.*, C.*
FROM A inner join B using (id)
inner join C on (C.a_id=B.id)
WHERE A.owner_id = 123 AND C.range_id=3
ORDER BY A.key_part_2 DESC LIMIT 0,100

Is this an innodb table or myisam? If the former, when was the last time the statistics were updated using analyze table? The query optimizer looks at ALL the join plans, doing an exhaustive depth-first search of the tables in order.

That means that the query optimizer calculates the costs of all partial plans for table A, then calculates the costs of all partial plans for table A joining table B, and then the costs for A join B join C.

If table statistics are off, the query optimizer may not choose the best plan.

However, because you're using a comma join, MySQL does not apply *any* join constraint. It does:

A CROSS JOIN B CROSS JOIN C

and *then* filters out rows using the WHERE clause.

If you use the proper syntax as demonstrated above, the rows are filtered out BEFORE JOINing, and thus you'll have fewer rows to join. In other words, *you* "know that the range generated from A is smaller then [sic] the range generated from B", but you have *never told MySQL*.

MySQL literally does a cross join of 111 million rows, and then filters out rows via the WHERE clause. I'm surprised you're not complaining about the performance.

So, what are your results when you let MySQL know what's a JOIN condition (filter before table joining) versus a WHERE clause (filter after table cross join)

Toasty said...

MySQL literally does a cross join of 111 million rows, and then filters out rows via the WHERE clause. I'm surprised you're not complaining about the performance.

Is this really true? Trying out a few queries both ways on 5.0.20 I see identical explain plans and execution times. If the where clause was simply a filter after the cross join, then all ',' joined queries would run horrendously surely?

I'm not saying that there aren't cases where use of INNER JOIN doesn't result in better query plans, but I've never really noticed the 'cross join then filter' behaviour you mention for ',' joins.

Do you have any examples?

Cheers! :)


As an aside and not in reply to any particular comment, I was reading this earlier - highly recommend it -

http://dev.mysql.com/doc/internals/en/optimizer.html

I must admit I was a bit surprised to find that the optimiser isn't anywhere near as clever as I thought it was in some circumstances. (see the bit about index choice sometimes being based on the order that indexes were created in! So odd.)

Toasty

Toasty said...

The B,A,C join order is especially odd as you've no WHERE criteria limiting the rows returned from the B table, so I'm not sure why the DB would pick B to start with at all.

I suppose if you've got no indexes on A.owner_id or C.range_id then the DB might not think that they're vastly better choices to start with - but even so you'd expect the smaller table to be used first. Of course this assumes that overall table row count is indeed something mysql uses for join optimisation - and I'm not sure about this.

I'd start with running ANALYZE TABLE to update your index stats if you can as Sheeri suggests. (Not sure how much it will help if you've no indexes on A.owner_id or C.range_id though)


Personally I've only ever seen anything as odd as this with tables with index stats which are waaay out.

Oh, and what does the explain plan look like? It might well give a clue as to why MySQL has 'optimised' the query this way.

Toasty

Anonymous said...

A join is between two tables right?
How can there be a difference between joining A to B or B to A?
You are joining A AND B.

If I sound snotty, I don't mean to, I'm genuinly interested in the difference.

I'm also curious about a point sheeri brought up. I tend to put WHERE conditions in an ON clause for no particular reason. Is that a benefit?

Toasty said...

A join is between two tables right?
How can there be a difference between joining A to B or B to A?
You are joining A AND B.


Logically yeah they're the same, but in practice the order of joining can make a huge perf difference.

AFAIK MySQL always joins tables using a nested join, where it looks at one table (using an index scan say) and for each value separately looks up the corresponding value in the second table (using an index lookup say)

So if you have table A with 10 rows joining to table B with 1000000 rows it's likely to be much quicker to index scan A and do 10 index lookups on B (one for each row in A) than it is to index scan B and do 1000000 index lookups on A.

In addition, if you have further conditions/restrictions on a particular table (e.g. say your query contains WHERE B.somecol = 1 as well as the join) then AFAIK MySQL will, if somecol is indexed, have a rough idea of how many rows occur in table B where somecol=1, and so will be able to determine whether it makes sense to use table B as the 'driving' table instead of A.

e.g. if there is only 1 row where somecol=1 in table B, it probably makes sense to find that row in B first, then index lookup on table A with that row, because you need to do it just once.

Does that make any sense? It's early here :)

Toasty

Anonymous said...

Thanks Toasty, it makes sense now.

Dathan said...

In response to sheeri.

INNER JOIN and CROSS JOIN or the comma operator are equivelent. They all produce a cartesian product.

INNER JOIN and , (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).


Join info


As for what table I'm using, it's INNODB, using myISAM in a high load production environment when doing billions of queires per day is not good. In terms of speed it's ok, but stability it's not.

Certain conditions involving deletes will cause the myISAM table to mark itself as corrupt, and require a repair.

Not to mention the maintenance needed to keep the stats on the table in order.

Thus the need for a storage engine like INNODB, for my types of problems.

When Falcon comes out and is rock solid, I hope to put that into production.

Toasty said...

Certain conditions involving deletes will cause the myISAM table to mark itself as corrupt, and require a repair.
Not to mention the maintenance needed to keep the stats on the table in order.
Thus the need for a storage engine like INNODB, for my types of problems.


Ooh I've never come across this problem, but we're not dealing with anywhere near as many queries or as much data as you have to. (probably around 50 million queries on a busy day, but with only maybe 50GB of data and relatively few deletes on the MyISAM stuff.)

Do you know the specifics or did it just happen sometimes? If you do know the specifics I'd be very interested to hear!


>When Falcon comes out and is rock solid, I hope to put that into production.

Yeah it sounds very promising. The presentation video by Jim Starkey is definitely an interesting watch.

Cheers,
Toasty

Anonymous said...

"What would be cool is for mySQL to build a Machine Learning algorithm to make the optimizer learn this stuff over a period of time."
Whar a great idea.

Chris Poust Eurotemp

sexy said...

情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,按摩棒,跳蛋,充氣娃娃,情境坊歡愉用品,情趣用品,情人節禮物,情惑用品性易購

免費A片,AV女優,美女視訊,情色交友,免費AV,色情網站,辣妹視訊,美女交友,色情影片,成人影片,成人網站,A片,H漫,18成人,成人圖片,成人漫畫,情色網,日本A片,免費A片下載,性愛

A片,色情,成人,做愛,情色文學,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色,情色視訊,免費成人影片,視訊交友,視訊聊天,視訊聊天室,言情小說,愛情小說,AIO,AV片,A漫,av dvd,聊天室,自拍,情色論壇,視訊美女,AV成人網,色情A片,SEX

情趣用品,A片,免費A片,AV女優,美女視訊,情色交友,色情網站,免費AV,辣妹視訊,美女交友,色情影片,成人網站,H漫,18成人,成人圖片,成人漫畫,成人影片,情色網


情趣用品,A片,免費A片,日本A片,A片下載,線上A片,成人電影,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,微風成人區,成人文章,成人影城,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,臺灣情色網,色情,情色電影,色情遊戲,嘟嘟情人色網,麗的色遊戲,情色論壇,色情網站,一葉情貼圖片區,做愛,性愛,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,美女交友,做愛影片

av,情趣用品,a片,成人電影,微風成人,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,愛情公寓,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,aio,av女優,AV,免費A片,日本a片,美女視訊,辣妹視訊,聊天室,美女交友,成人光碟

情趣用品.A片,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,色情遊戲,色情網站,聊天室,ut聊天室,豆豆聊天室,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,免費A片,日本a片,a片下載,線上a片,av女優,av,成人電影,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,成人網站,自拍,尋夢園聊天室

sex said...

徵信社
情趣用品
情趣用品
情趣用品
情趣
情趣


SM
充氣娃娃


SM
性感睡衣


免費視訊聊天室
aio交友愛情館
愛情公寓
情色貼圖
情色文學
情色小說
情色電影
情色論壇
成人論壇
辣妹視訊
視訊聊天室
情色視訊
免費視訊

免費視訊聊天
視訊交友網
視訊聊天室
視訊美女
視訊交友
ut聊天室
聊天室
豆豆聊天室
尋夢園聊天室
聊天室尋夢園
080聊天室
080苗栗人聊天室
女同志聊天室

上班族聊天室
小高聊天室






免費視訊聊天
免費視訊聊天室
免費視訊
ut聊天室
聊天室
豆豆聊天室 聊天室尋夢園
影音視訊聊天室


色情遊戲
寄情築園小遊戲
情人視訊網
辣妹視訊
情色交友

成人論壇
情色論壇
愛情公寓
情色
色情聊天室
色情小說
做愛
做愛影片
性愛


一葉情貼圖片區
情趣用品


辣妹視訊
美女視訊
視訊交友網
視訊聊天室
視訊交友
視訊美女