mySQL DBA, Architecture, Dev, Scale, HA, Code : Unorthodox approach to database design

Saturday, August 19, 2006

Unorthodox approach to database design

There are whole books on the subject about building a great design that is scalable and portable among developers and or administrators.

Then there are whole books on the subject of capacity and scalability for the database layer.

Then there are novels from developers that in many cases really don't know the tricks of the DBMS they are working with, and create elaborate abstraction layers that automatically generate SQL for the DB in question from objects and such.

But, with all these people who tell you how to do it, actually can they prove that it works under a constant high workload for many people all at the same time.

I can boast this. Flickr does over 4 BILLION+ queries per day, 2 BILLION of which are SELECTS. Most of our data is REAL TIME queries from the database layer. We don't do any fancy tricks to dedicate resources to API calls to certain servers; they hit the SAME servers that the Flickr Users do.

You may be thinking to yourself yea right say you can do 20K + transactions per seconds that must be a crap load of expensive hardware all running, where all the data is served out of memory. Nope we run our stuff on RHEL-4.0 with mySQL version 4.1.20-flickr (my little special tweaks for x86_64) and the data is only 3% HOT (meaning out of the 1 TB of user data less then 3% is in memory).

Still hard to swallow? Let me add a little more info to blow your mind and wipe away all the things that you may have read that can't be done.

All of our database connections are real time. Our load balancer for the database is written in 13 lines of PHP code.

How can this be, how does Flickr scale? How did the Flickr Engineering Team do this (6 people)?

If you’re interested let me know post a comment, and I'll write up the design that I proposed July'ish 2005 to Flickr which we use today. It's able to scale linearly based on a function of users not on content.

50 comments:

Anonymous said...: Yes, I am interested in that.

I've often wondered if the API calls were handled differently, interesting to find they aren't. Guess that's an example of "don't optimize something 'til you have to".

Tom; Sat Aug 19, 04:18:00 PM
John Speno said...: Spill it please. :-); Sat Aug 19, 04:41:00 PM
Anonymous said...: You have to be kidding... of COURSE we're interested!; Sat Aug 19, 05:25:00 PM
Anonymous said...: please do let us know your secrets to performance; Sat Aug 19, 05:25:00 PM
Anonymous said...: This is some attempt to see if 200 people will post the same comment? Okay, then...

Please post the details. :-); Sun Aug 20, 12:05:00 AM
Anonymous said...: Please, do tell.; Sun Aug 20, 04:30:00 AM
danofames said...: yes, go for it.; Sun Aug 20, 05:54:00 AM
Anonymous said...: Yes please.

I bought a couple of Cal's book and while it explains a few things that I hadn't considered it doesn't mention much regarding the API.; Sun Aug 20, 06:05:00 AM
Anonymous said...: I'm in. :); Sun Aug 20, 09:02:00 AM
Anonymous said...: This is one of those things you have to share with the rest of us!

Come on, what's your secret sauce?; Sun Aug 20, 01:05:00 PM
Anonymous said...: Yea, come on, spill it!; Sun Aug 20, 03:47:00 PM
Dathan Pattishall said...: Cool, I'll post diagrams, code snippits and a few other things that will make you all go wow, that's it?

Give me about a week though.; Sun Aug 20, 03:52:00 PM
Anonymous said...: I'm highly interested in Mysql Scalability and High Availability topics. I would really appreciate if you can publish the relevant documents with complete details. That would help everyone looking for such solutions.

With best regards,
S.Mugunthan; Sun Aug 20, 10:14:00 PM
Anonymous said...: I'm highly interested in Mysql Scalability and High Availability topics. I would really appreciate if you can publish the relevant documents with complete details. That would help everyone looking for such solutions.

With best regards,
S.Mugunthan; Sun Aug 20, 10:33:00 PM
Anonymous said...: I would love to see the load balancing code you did in php.; Mon Aug 21, 12:18:00 AM
Anonymous said...: Of course: Yes please! :o); Mon Aug 21, 02:44:00 AM
Anonymous said...: Hi! I made few comments in my blog, but sure, I'd like to see your design documents! :-); Mon Aug 21, 10:52:00 AM
Anonymous said...: Color me intrigued.

I would love to read about your design ideas. And if you don't mind will you post a followup comment on this post with a link to thw writeup if/when you do so?; Mon Aug 21, 11:41:00 AM
Anonymous said...: Chalk up another under the "yes please" column.; Mon Aug 21, 11:12:00 PM
Anonymous said...: I want to know!; Thu Aug 24, 05:33:00 PM
Anonymous said...: pleeeeeeese?; Sat Aug 26, 10:11:00 PM
Anonymous said...: +1 interested, give give!; Tue Aug 29, 05:10:00 AM
Anonymous said...: Definitely interested...looking forward to finding out the details!; Tue Aug 29, 05:48:00 AM
Anonymous said...: I would love to hear your secret; Tue Aug 29, 08:41:00 AM
Anonymous said...: Been thinking of hash based balancing across multiple slave servers. (see Cache Array Routing Protocol)

So each mysql service would be only be serving a 1 / N 'th of the data, (where N is number of slaves).

This effectively combines all the mysql's own query caches into a single cache, rather than having N query caches with roughly the same thing.

But definately would be interested in the method flickr has implemented :); Tue Aug 29, 09:24:00 AM
Anonymous said...: yep - another one interested to see the "Unorthodox approach to database design" details......

+1; Tue Aug 29, 09:40:00 AM
Dathan Pattishall said...: Writing the stuff up in my spare time, but with the launch of maps the query load increased by 43%; I will need more time to get this out. It's not a big deal with the system in place, but I need to figure out if we are doing more then necessary queries. So, stayed tune.; Tue Aug 29, 10:24:00 AM
Ruturaj Vartak said...: I'm interested.
I'm myself doing a lot to make MySQL Scalable, thanks to MySQL its still cheaper and efficient than DBs, yet there are 100s of tricks, that can help you elevate MySQL's perfomance.; Tue Aug 29, 09:55:00 PM
Anonymous said...: me too :); Wed Aug 30, 05:03:00 AM
Anonymous said...: +1 :); Wed Aug 30, 01:22:00 PM
Anonymous said...: Definitely looking forward to this. I will take production examples over pseudo any time.; Fri Sep 01, 06:16:00 AM
Anonymous said...: dathan, you may be having one challenging time what with surpassing 1 million geo-tags in a day compared to the expected 1 million a month. And ur DB is still standing. Good stuff.

Waiting on the followup article. Thanks.; Fri Sep 01, 07:11:00 AM
James Lynn said...: and theeeeeeeeeennn????; Sun Sep 03, 06:53:00 PM
Anonymous said...: well...what's the holy secret, oh great guru of MySQL?...; Tue Sep 05, 05:07:00 AM
Anonymous said...: Yes, please, I'd be thrilled to read that!; Wed Sep 06, 02:56:00 AM
Anonymous said...: I'm interested, too. Please send me the secret.

odoaker@freemail.hu; Thu Sep 07, 05:18:00 AM
Anonymous said...: I'm interested, we don't have anywhere near the queries and I have some performance issues.

Thanks; Wed Sep 13, 12:59:00 PM
Anonymous said...: I'm interested too, please send me the info to dude30m@hotmail.com; Sun Sep 17, 07:31:00 AM
Anonymous said...: Still nothing???; Tue Sep 19, 05:44:00 AM
Dathan Pattishall said...: This is on a bit of a hold.

Have allot of work to get through, and I need to run my blog post past my manager to make sure that I don't step on Yahoo's toes.; Tue Sep 19, 10:09:00 AM
Anonymous said...: Please tell us anything you can. Some of us need to convince those VC types that this isn't all just voodoo. We understand you can't give out trade secrets.

I guess it would cost too much to poach you now ;); Tue Sep 19, 06:25:00 PM
Anonymous said...: Yes I guess it was too good to be true :( darn managers !

dude30m@hotmail.com; Fri Sep 22, 02:43:00 PM
Anonymous said...: Let's here the magic behind the voodoo.

Did you ever end up sharing this?; Tue Oct 03, 03:16:00 PM
Anonymous said...: Hi,

Could you publish the details that would help me a lot in my work. I would highly appreciate your timely help.

Regards,
S.Mugunthan; Wed Oct 04, 05:52:00 AM
Anonymous said...: I take it this is going nowhere...; Thu Oct 12, 01:53:00 PM
Dathan Pattishall said...: Hello all, I'm writing this up now. It will be reviewed by my manager some time next week, so check back soon.; Thu Oct 12, 02:48:00 PM
Dathan Pattishall said...: Start of it here; Tue Oct 17, 03:24:00 PM
Anonymous said...: This comment has been removed by a blog administrator.; Thu Dec 14, 03:34:00 PM
Anonymous said...: I am interested too!; Mon Dec 18, 11:18:00 AM
Anonymous said...: Great site, I am bookmarking it!Keep it up!
With the best regards!
David; Sun Dec 24, 10:14:00 AM