Tuesday, December 04, 2012

Raid Stripe Size, Raid Stripe Segment Size the definition and SSD consideration

For years now I have been confusing Stripe Segment Size with Stripe Size, when configuring my RAID arrays. I always thought that Stripe Size is the number of KB of a file written per disk before moving to the next disk in the array. So, if I had a stripe size of 16KB and I am writing a 20KB file-16KB would be on the first disk, 4KB would be on the second. This concept of moving to the next data bearing disk in the range is not the stripe size. This is the stripe segment size; in lays the confusion. The stripe size is 16K * Number of spindles. So when setting the stripe size in your raid array its the stripe segment size * the number of spindles. In this example of setting 16KB stripe size the stripe segment size is 16KB/# of data bearing disks.

After reading a pdf and this web page links provided to me from Richard Hesse, the definition is defined.

Let's define a stripe:
A set of contiguous segments spanning across member drives creates a stripe
For example, in a RAID 5,

4 + 1 disk group with a stripe segment size of 128KB, the first 128KB of an I/O is written to the first drive, the next 128KB to the next drive, and so on with a total stripe size of 512KB.
The stripe width is 4 since there are 4 spindles doing the work, 1 spindle for the parity bits.  The 4 spindles doing the work are also known as data bearing drives.

4 disk * 128K stripe segment size = 512K stripe size

or using algebra (ignoring units)

 512K stripe size / 4 disk = 128K stripe segment size
Thus when setting a stripe size this is the contiguous blocks on an array of stripe segments.

Why revisit this?

According to this acm article on SSD Anatomy the program/erase block is 8KB. For me Optimizing Raid speed for sequential access is less of a concern over Optimizing for Longevity. SSD's by nature have a life span based on the number of writes to an SSD drive.  Therefore, the process of erase/program starts killing off the longevity of a drive. Thus, having a large stripe and modifying that stripe can cause a large amounts of erase/programs killing longevity.

Richard Hesse current settings is a 32KB stripe across a RAID5 SSD array of 8 disks-that’s 7 data bearing drives. This means that to write a 32KB stripe, that’s 4.57KB per SSD data bearing drive which falls inline with the 8KB block programmed/erased increasing the longevity much more overall.

Why not use more Raid5 4+1 with a 32KB stripe to get a 8KB stripe element size per data bearing disk? For this setup the data is large and requires a lot of IOPS thus the best configuration for the requirements while still keeping longevity some what in check.

What about NON-SSD drives? For INNODB workloads, it’s all about tuning for how your application typically uses the database and how IOPS are used. INNODB DISK IOPS are rather predicable verses other Engine IOP usage. A Page is 16KB. When selecting rows, mySQL is pulling 1 to many pages. Many rows can fit in 1 page or span more then 1 page. INNODB typically tries to align pages so sequential disk access has a higher chance when pulling groupings of rows. Therefore, for spinning metal drives having a raid stripe size of 64KB-512KB is certainly plausible and recommended. Typically for my web apps, 256KB has been a good stripe size. My disk setup is typically Raid-10 across 8 2.5" 15K RPM drives, so that is 4 data bearing drives with a raid segment size of 256/8 == 32KB  or roughly 2 pages per data bearing disks. One thing of note, its really hard to find a sweet spot and typically you will notice better performance gains on spinning metal drives with XFS alignments su, sw options or follow this guide by Jay.

No comments: