Thursday, December 13, 2007

Flickr Stats how is it built

    Flickr Stats how is it built.
  • All Collection is done realtime
  • MYISAM and INNODB is used
  • The data is spread across 6 clusters (12 servers-6 used, 6 for fail over) mainly for data storage requirements
  • Memcache is not used at all in the core of the product.




In summary this was the longest project that I worked on, other then rebuilding the backend for Flickr when I first came on. The inner workings are very complex to achieve real-time collection-while not affecting page load times of a photo page. Most of my time was spent on creating a distributed lock once my DB design was solid.

Things that would really make life easier:

MYSQL AB gets rid of MYISAM and makes PBXT its replacement. I don't need all the great features of INNODB but I would like some. I'll go more into this later.

Additionally INSERT DELAYED worked with ON DUPLICATE UPDATE. Currently it does not.

Finally cross Engine Transactions would be cool but really not required.

Tuesday, December 11, 2007

Distributed Locking Solving it with mySQL

At Flickr we use a Federated Architecture, where the WHOLE of the data is spread across many servers. This means that when you load a photo page the dynamic data generated is hitting a server that holds that data, while another photo page view may hit another server for content.

Now to do complex operations to control the flow of what operates on data, the complex operation must lock the data. Let's call this a global lock, which is used to coordinate many jobs. Now to make these operations fast, lets run many processes across many servers. Essentially we needed to solve a hard problem in parallel computing: distributed locking.

This took a bunch of dedicated time to get done properly. We solved the issue by using 3 database clusters.

First I used the coolest mysql feature out there; GET_LOCK and IS_FREE_LOCK. These two mysql function are the heart of the distributed lock.

Why use them both?

GET_LOCK will lock a string in MEMORY on mysql, and anything else trying to get a lock on that string will have to wait until TIMEOUT is reached.

Since the TIMEOUT is in seconds, I don't want to wait and slow down everything, so we call IS_FREE_LOCK first which will return in a fraction of a ms


How do we use it?

We lock an entire Account classification by setting the lock on the owners Cluster.
An account classification is defined as ACCOUNT_123_TASK - which is the string for the lock, this is hashed to the same SERVER where the account is located.

Once the lock is established check another cluster to ensure that we have not performed the action already-this ensures that the event that entered the system doesn't corrupt data if the message to process made it into the system more then once.

Finally operate on the data, if any of the operations fail ROLLBACK all changes, log and continue.

The cool thing about this is that the locks and data are all in the same place. So, if the lock is not achievable then the data is unaffected.