Tuesday, November 27, 2007

When starting out building an application should I use Complex Joins?

I get questions on how to build mysql application from time to time, and I wanted to share the response to the question below.



In the beginning, should a developer try and stay away from complex JOINS? Is it better to use many smaller queries and cache the results (using something like memcached) rather than 1 larger query requiring multiple table JOINS?

Needing to scale for millions of users is a great problem to have, but should I worry about it in the beginning? Whats the best approach to db design for day 1?



Well IMHO build the application, build it to spec. When the scope changes (and this will always happen) flow with it. This is in no means a knock against Product-I actually look forward to this, because things are being discussed and if something needs to change lets do it!

Now how to build the application to scale: Look at the scope of a page and gauge the frequency of access of that page. For instance on a page displaying 1 photo-very few if any ranges and/or group-by and/or order-by and/or joins would be used to generate this page since a photo page needs to generate fast or loose the interest of the page viewer. On the other hand a page displaying the accounts holder account information thats accessed at a order of magnitude less rate-this page can have tougher queries. (Note: watch out for contention)

Once the feature set has been decided upon; take the application apart and find the bottle necks. Highlight the issues that will not scale.

This can be figured out by filling your dev-database with bogus information and running mysqlslap of common select queries to test stability, response time, speed etc.

Take into account how scale can be achieved before changing code.

Such as is the hardware ok to run all my services?
Can I split some services out like move WWW specific traffic to a separate box from the database? (You will be surprised how many people combine the servers)
Am I allocating mysql resources properly? If not how can I tell?
What is my system reporting as the lack of resources? Am I running out of CPU or IO?


The real performance gains come from laying out the data properly and reducing the amounts of sorts, ranges, joins needed to generate your dynamic pages.

In some cases I've had to add tables where data in said tables represent start points to range in other tables. Then on top of that spread that core-logic across MANY individual machines just to get a page to load in less then 300 ms.


Finally add Nagios to alert you when things go wrong such as too many procs running on this database server and Ganglia to chart system resources over time.


    So, to answer the question should I use complex joins at 1st?
  • Sure, but here is a rule of thumb-the harder the query is the less it can be used.

  • If traffic ramps up, look to see what the hardest queries are and their frequency of use. Try to reduce these execution times by spreading the data out to more tables or more servers, only if benchmarks show that this is the way to go.


  • If traffic really ramps up because the application is the next best thing since Flickr, then look at the larger picture of what should be done to scale.

5 comments:

Dale L. said...

I always say to let the database do the heavy lifting when it comes to managing data - that's what it was built for. If you don't put the complexity in your join then you will put it somewhere else within your app. If you believe that you will have millions of users, so be it, I can guarantee you the DB will handle the selection of the data you need much, much better than you can do it with several queries and application logic.

David H said...

I have found in *all* of my experience is that the number one point of contention on websites is I/O, and JOINs are a wonderful way to increase I/O. The question is -- to what extent? And... what are possible alternatives (as putting data into arrays can run memory low).

Thus, this is a semi-indicator of what one can expect in the future and make adjustments ahead of time before you start spending 23 hour days fixing issues that could have been resolved earlier:

First Step: You write your code in dev.

Second Step: Create monster-size tables/databases on a test server (not the development server nor the production server) and run a benchmark script all day long.

Third Step: Make adjustments and repeat.

This will ease some future worries and will avoid the "running around like a chicken with its head cut off" syndrome in the case of unanticipated success.

-David

toby said...

I would endorse David H's approach, as we have just been through this cycle in our application. Create loads of test data - more than you expect to occur in reality - and try your queries. We found a variety of unexpected issues this way. A slow query can cause a cascade of issues including deadlock timeouts, etc.

mitch said...

thank you!

sex said...

徵信社
情趣用品
情趣用品
情趣用品
情趣
情趣


SM
充氣娃娃


SM
性感睡衣


免費視訊聊天室
aio交友愛情館
愛情公寓
情色貼圖
情色文學
情色小說
情色電影
情色論壇
成人論壇
辣妹視訊
視訊聊天室
情色視訊
免費視訊

免費視訊聊天
視訊交友網
視訊聊天室
視訊美女
視訊交友
ut聊天室
聊天室
豆豆聊天室
尋夢園聊天室
聊天室尋夢園
080聊天室
080苗栗人聊天室
女同志聊天室

上班族聊天室
小高聊天室






免費視訊聊天
免費視訊聊天室
免費視訊
ut聊天室
聊天室
豆豆聊天室 聊天室尋夢園
影音視訊聊天室


色情遊戲
寄情築園小遊戲
情人視訊網
辣妹視訊
情色交友

成人論壇
情色論壇
愛情公寓
情色
色情聊天室
色情小說
做愛
做愛影片
性愛


一葉情貼圖片區
情趣用品


辣妹視訊
美女視訊
視訊交友網
視訊聊天室
視訊交友
視訊美女