Wednesday, November 03, 2010

Facebook Live: Running MySQL at Scale Tech Talk by Facebook

Thank you to Domas, Mark, the cool team of DB Dudes at Facebook and to Facebook for hosting a spectacular event of beer, wine and of course mySQL.

Here are some notes that I'd like to share and here is the full video recording of the event.


Before I start, let me say that I'm super impressed with the tools that the Facebook DB Teams put together. They are to say the least better then commercial grade. The tool set covers all the staples like graphing show global status, InnoDB stats, real-time trending to see what the norm is and how things flux in a professional designed interface with drill downs to the actual root cause of the end query messing things up.

Facebook has three (main?) database teams.

Operations who fix the problem right now.
Performance who fix the problem today or tomorrow.
Engineering who don't fix the problem fast enough *joke*

Even though there are disciplines the database teams all gel.

Now for some stats:

Query response time R(t): 4ms reads, 5ms writes
Network bytes per second: 38GB @ peak
queries per second: 13M
Rows read per second: 450M @peak
Rows changed per second: 3.5M @peak
InnoDB disk ops per second: 3.5M

What does it mean? WOW this is webscale.


For Facebook, network latency is killer. Cross country queries hurt, especially done serially. For instance

start a transaction
add a new row to one table
increment another table
end a transaction

each command is a network trip, the network latency can take up to 100ms yet the sql command is less then 10ms. Why put in all that effort of tuned servers, queries, code to give a bad user experience of 100ms.

Using a feature in the mySQL client API, they can send multiple statements in the time it took for the 1st query delta. This API flag for mysql_connect is CLIENT_MULTI_STATEMENT.

Why not just use triggers or stored procs? Because managing them is a nightmare, I've talked about this many times and I'm glad Facebook agrees - this is a great trick for reducing the cost of cross datacenter db calls producing dynamic database stored procs. I doubt that triggers or stored procs are fully gone.

Then OSC, they talked about how much time this saved the company a few weeks ago and its staggering. What use to take days or longer is reduced to hours. I'm so impressed by this, that I asked my team to make this into a web app that can execute these commands across an arbitrary set of servers. My newest team member Einav finished it and we now use it in production (screen shots and a follow up post coming).

Facebook is surprisingly open and really is fostering / giving back to the community. Keep up the great work!

No comments: