Tuesday, January 27, 2009

Shard Types && innodb_io_capacity, innodb_write_io_threads, innodb_read_io_threads

Another project that I am doing introduces a new shard type. First, think of a Shard as RAID-10 on your database layer. The data is striped across N servers and mirrored for failure recovery.

Shard Types are Shards that serve specific purposes. For instance, I have an archive shard. This shard keeps data the is rarely ever read on even cheaper hardware-with slower disks and the data is naively compressed using the innodb_file_format=Barracuda
option. If and/or when a person requests data that is archived, my software layer detects this case and migrates the archive data from the super crappy RAID-5 servers to the less crappy RAID-10 servers with 32 GB of RAM, all within 5-10 seconds.


Another shard type is the sub-shard. I use this terminology to define a different way of federating data for a predefined global object_id. A predefined global object_id is what you are federating by, for instance userid.

userid X maps to Shard 3


What if you had an application that is expected to take up 8TB of data? Sticking all of userid X's data on a single shard is very costly since current size of the cluster + 8TB means more servers are needed and is costly (more data, larger ranges, slower the avg query speed). So why not put it on another set of servers that have big slow'ish disks, i.e. the Sub-Shard.

Now for userid X

userid X maps to Shard 3
userid X maps to SubShard 1


Since some shard profiles do not require 100% uptime, I can run builds of mySQL that is cutting edge.

Using the Percona 5.1 build, I ran across new tweaks for XtraDB. The patch is from the Google Patch made by Mark Callaghan's Google Team. Mark's architecture is purely disk I/O bound, and from his tests, INNODB does not use I/O effectively. He added a bunch of code dealing with the I/O performance bottlenecks that innodb native has. The three new tweaks addressing the I/O bottleneck specifically are

innodb_io_capacity = 100 // If running without the patch this is the equivalent default behavior
innodb_write_io_threads = 1
innodb_read_io_threads = 1


From what I gather, this means that DEFAULT INNODB uses one write thread 1 read thread and will only work on 100 pending iops at a time-which is probably why INNODB takes sooo long to shutdown since by default 90% of the buffer pool contains dirty pages.

If your servers have a large innodb_buffer_size, on the order of a few Gigs and the server has many spindles (greater then 1) then increasing these params may help your application purge dirty pages faster. Be warned; do not increase this to high. If your box has a lot of pending I/O and this new code does not have the bandwidth to flush based on your settings, the box will freeze. That means you will have to remove traffic off of the server, let the pending io finish, lower the settings and restart.

So, what is a good size to set this to?

Percona says 100*Number of spindles you have on the db server. That seems a bit high for me. I would say 100*Number of spindles / 2 - to start out with, then over time ramp it up.

Currently I am about to run 10 Shard or 20 servers with the Percona build. This will purely I/O bound load.

No comments: