Monday, October 20, 2008

mySQL Disk Benchmarks with Sysbench. When will mySQL / INNODB stop scaling?

Today's "commercial quality disks" are amazing but they follow the same limitations as yesterday's disks. mySQL scales very well, but disks do not. So if you're IO bound when will your expectation of speed fail?

Test setup:

DELL 2950 PERC-6 HWRaid BBC 6 DISK 15K RPM 3.5" RAID-10 256K stripe across two channels-using WRITE THROUGH CACHE on mkfs.ext3 -T largefile4 Linux Filesystem.


The theory is that the outer part of the spindles is the fastest, and the inner portion is slower - since the outer is where the data starts (thanks for the info Benjamin Schweizer). Thus one can conclude that the more disk space your application(s) use the slower the throughput, since the heads have to move more. Brad F. my co-worker did a benchmark to prove this. Our goal is to find out at what is the saturation point if our expectation is to have 22 MB / sec of random access.

Why do we want 22MB / sec of random access throughput? We want to guarantee a certain level of performance when adding new apps to a common backend-which is I/O bound: we need to know when things will break.

Here is what Brad found: Total disk size for our RAID-10 setup =~ 800G. What point does it FAIL to achieve our expectations of sustained 22MB/s?

rndrw test across 100G test / 750G LV =~ 35 MB/s # outer part of the spindles
rndrw test across 100G test / 300G LV =~ 32 MB/s # outer part of the spindles
rndrw test across 250G test / 300G LV =~ 24 MB/s # sweet spot
rndrw test across 350G test / 384G LV =~ 21 MB/s # saturation point
rndrw test across 750G test / 800G LV =~ 14 MB/s # waste of space


In conclusion these test show that even though a RAID-10 setup with 800G of space is available, the expected performance drops when data exceeds the sweet spot of 250G-300G of 800G usable-data array.

Disclaimers: There are many factors that can raise or lower the bar, like different file systems, different I/O schedulers, flushing. For my setups I like

Deadline I/O scheduler
256K Stripe
few inodes (don't need them)
ext3 since that’s what stable and available.

6 comments:

Anonymous said...

Two corrections; first, HDDs are written from outside to inside, making them faster at the beginning of the disk, not the end. This is due to zone block recording (ZBR) which packs more sectors on the outer tracks and results in more tracks passing the heads, per rotation.

Second, random IOs are way more important than linear IO for DBMSes. On traditional HDDs, this is affected by the number of spindles (HDDs) and the rotation speed. A 10k rpm disk makes a maximum of 166 seeks per second (10k/60=166) which is pretty slow. On small block sizes, this results in maximum transfer rates of about 85kb/s (166 rotations/second*512byte/rotation).

Of course, you can speed this up; I suggest loads of system memory, reduction of random IOs and one of these shiny new SSD disks (they are supposed to do up to 35,000 IOs/s).

Btw, you can check the random read rate using iotest (http://benjamin-schweizer.de/files/iotest/), which I've written recently.

Dathan Pattishall said...

@benjamin schweizer
Making corrections,
Thanks

Anonymous said...

Before you recommend anything you need to run multiple test varying one variable at time.
All you have shown is your random read/write performance just for this setup.
I suggest rerunning sysbench after changing the stripesize as alignment problems can be issue as well. First test, try 1M.
Bonnie++ and iotest are also good, in that you can vary the blocksize of the read/writes as well as the filesize.

Anonymous said...

if you have battery backed write cache then use write back rather than write through for some more performance for heavy writes.

Anonymous said...

I'm very curious about the effects of the battery backuped raid controller settings and their effects on MySQL performance:

Cache Policy = DirectIO or CachedIO
Write Policy = Write Through or Write Back
Read Policy = ADAPTIVE or READ-AHEAD or NORMAL

Anonymous said...

I'm going to figure a way to do a test run with the instructions provided. Thanks for the post.