During this time in 1999, we had 2 database servers named Godzilla and Mothra. I believe they where Sun E4000s with a super expensive diskpack in raid-5. Since mySQL was only in Gamma back then and replication was just a concept, we couldn't use replication to handle the onslaught of load from our users. Getting 200 queries per second would saturate AuctionWatch back then.
So, what was done? Well more database boxes on more very expensive Sun boxes that handled different parts of the site was the 1st approach. The problem with this was if any mySQL instance crashed or a ISAM (yes we used ISAM back then) table went corrupt, the entire site went down. Also more boxes meant more points of failure.
So, I came up with a plan and ran it past my managers. What if we could do ACP in a box? Meaning what if we could split up all sellers' data across many very in-expensive Linux boxes. I sold the idea buy producing this reasoning
If one box went down then only a small population of users would be affected.
The application can query data faster when there is less of it.
The application can run more updaters since it would have multiple write points.
The application can launch auctions faster since the backend could support many more instances.
My boss loved the idea and the team worked on ways we could make it work for the site.
Then the dot com bubble burst and I got laid off so the project was only partially rolled out. I have no idea if they are still using this today, but if so great!
This was the beginnings of my unorthodox approach to database design.
- Denormalized data.
- Spread the data across many physical instances.
- Keep data small.
- Allow for any question to be asked.
- Allow for parallelism.
- Allow for absolute redundancy so downtime does not exist or is minimal.
Up next:
Warp ahead to Friendster - what they didn't execute on
Then to Flickr - after a 5 min talk with Cal and Serguei
How Flickr Database Layer Scales
2 comments:
Hi Dathan,
I have a question on this concept in regards to my own if you don't mind.
I am facing a similar situtation and one of my goals is to partition data by user/partner and when a cluster of servers limitations are met, create a new cluster to store new partner data to. The partner list will always remaing relatively small (under 1000).
Are there any performance gains in your opinion by having partner specific databases as well (i.e. stats_(partner_id), media_(partner_id), etc?) and a common set of tables in each, or will this just create a management nightmare in the long run?
My assumption is that this would be a way to vertically partition the data (each partners data exists in their own set of databases ... with the goal of avoiding heavy use partners affecting less resource intensive users ... and to easily segregate existing partners to new data clusters if need be), along with the partitioning among different db clusters when limitations are met in one group of db servers storing a group of partners data. Is this a path worth pursuing (partner/user specific db's) in your opinion, or is it just overkill? Would it be forward thinking enough to simply intermingle the partner data in the same tables and simply partition the data into new db servers when necessary?
Considering your experience, any thoughts you could give me would be greatly appreciated.
Thanks,
Carter A.
What happened. Did this thread end? Where can I link to the next post in this series?
Post a Comment