This is Part 2 in an ongoing series on disk performance. You can read the entire series by starting at the Introduction.
RAID 10 is the most commonly recommended type of configuration to house OLTP data files for SQL Server because of the balance of redundancy and speed that it offers. However, poorly chosen configuration options can result in lackluster performance. Today I’m going to look at how stripe size, partition offset, and allocation unit size can affect the performance of a hardware based RAID 10 drive. For reference I’m using the results from my Test Harness #1, a Dell PowerEdge 2950 (view the test harness specifics here). I’m assuming you’re familiar with the various RAID levels; if not, take a second to read the Wikipedia entry on it.
Stripe Size & Allocation Unit Size
When configuring a RAID 10 drive the first choice you’ll have to make is what stripe size to use. On most servers I’ve worked with this is 64 KB. For fun, in addition to the default 64 KB stripe I tested an 8 KB stripe to match the size of a page and a 128 KB stripe to see if a larger stripe made any difference.
To use your newly created array you’ll have to create a partition and format it. On Windows Server most people just use the Disk Management snap-in in the Computer Management console to do this. When you go this route in Windows 2000\2003 your partition is created with a default offset of 32 KB. I’ll cover this more in the next section, so for now I’ll stick with the default.
When you format a partition you have to choose an allocation unit size. In Windows Server 2000, 2003, & 2008 the default allocation unit size is 4 KB. I tested the default size and just like I did for the RAID stripe I also tested 8 KB and 64 KB allocation unit sizes.
Finally, since we’re talking about OLTP performance here, I’ll focus on the kinds of operations that SQL Server does on data files most of the time. According to this post by SQL Customer Advisory Team and backed up by Linchi Seah’s post on OLTP workloads that’s 8 KB for both reads and writes, and both tend to be random in nature.
Here’s what 8 KB random reads look like:
And here’s what 8 KB random writes look like:
It’s pretty clear that a 128 KB stripe size offers the best performance. It also appears that allocation unit size makes relatively little difference when using a 32 KB offset. Now let’s look at what effect a 64 KB partition offset has on performance.
Partition Offset
Remember that by default Windows 2000\2003 uses a 32 KB partition offset. There’s been a lot of talk lately about how using a 64 KB offset can result in major performance gains, so let’s take a look at the same test setup but this time with a 64 KB offset.
Here’s what 8 KB random reads look like:
And here’s what 8 KB random writes look like:
Notice how a 128 KB stripe size no longer results in the best performance; that honor now belongs to a 64 KB stripe size. It also looks like an 8 KB allocation unit size offers the best write performance. However, keeping in mind that most OLTP databases fall somewhere around 80\20 in favor of reads, it’s clear that a 64 KB allocation unit is still the best choice on a 64 KB partition offset.
To offer one more perspective on how partition offset affects performance, here’s a comparison of 32 KB vs. 64 KB offset on a 64 KB stripe with a 64 KB allocation unit:
8 KB random reads:
8 KB random writes:
If the graphs weren’t obvious enough, try this: on a 64 KB raid stripe, using a 64 KB offset vs.32 KB offset represented an 11% improvement in IOs/sec and MBs/sec, and a 10% improvement in average latency for 8 KB random reads. For 8 KB random writes using a 64 KB offset resulted in a 10.6% improvement in IOs/sec, MBs/sec, and average latency.
Conclusion
The results show that for OLTP databases on a RAID 10 array the optimal configuration is a 64 KB RAID stripe, 64 KB partition offset, and 64 KB allocation unit size. As always, mileage may vary according to the hardware you’re on, however the gist of these results should hold true for most systems using local disks or DASD enclosures.
In Part 3 I will take a look at how the same values affect performance on RAID 5.
Update – 2/17/09: After posting my initial correction I reviewed my RAID 10 results again and found another problem – all the numbers still reflected 64 KB reads. Although I said that adjusting for 8 KB reads didn’t change the results I was wrong. I have updated the graphs once again, triple checked everything, and promise that the numbers are right this time!
13 comments
Thanks for that. I can imagine that a lot of work has gone into getting these results, but it is going to save the likes of me a lot of guesswork.
Good post. I ran similar tests and came up with the same findings e.g. "optimal configuration is a 128 KB RAID stripe, a 64 KB partition offset, and a 64 KB allocation unit size."
BTW, my tests were done on an old compaq RA1000 DAS and a new HP MSA.
Out of curiousity, has anyone tried a 256K stripe and if so, what offsets were tried? MS mentions in their article "Physical Database Storage Design" --
We recommend a 64 KB or 256 KB stripe size for most workloads. When the workload includes table and index range scans on tables that are larger than 100 MB, a stripe size of 256 KB allows for more efficient read-ahead.
Great post, I´m going to try the same with 256 stripe size and different partition offset.
Thanks for your hard work on this! It really clarifies how I plan to setup my array.
When using a 128k stripe size, wouldn't your partition offset need to be 128k? In the last test I believe your performance with a 128k stripe is being negatively impacted by this.
Will you waste disk space if your allocation unit (cluster) size is 64KB and your stripe is 128KB?
Great information. Now if I can just convince my SA to rebuild our SQL disks...
Isn't the idea with the partition offset to align the partition with the stripe size so that your stripe reads are always aligned? Thus, it makes sense that the 128K stripe size is slower because you didn't do a fair comparison. The 128K stripe size should be fastest with a 128K partition offset. All of the stripe reads are unaligned with a 64K partition offset.
Has anyone done the same tests for a 64-bit MySQL database?
Hi Kendal Van Dyke
read part 2 and 3 and am fascinated about the content the articles are providing.
My question is, why does partition offset increasing result in higher performance (I/O etc.)
Isn't so that the only reason of partition offset is to prevent misalignement.
Kind Regards
Is this still relevant? Take a look at this:
http://technet.microsoft.com/en-us/library/dd758814(v=sql.100).aspx
Win2K8 and Win2K8R2 and probably Win2012 already align to 1024KB for maximal compatibilty with stripe sizes up to 1024K. If anything it'd be interesting to see the same tests done with that alignment to see how it affects things.
Doubtful that the same hardware is still around, years later, just waiting for additional tests, though.
Post a Comment