Jay, your points are understood. Let me just state before I continue further below that i agree that smaller raid groups are even better in protecting against any data loss. If you like we could take this thread offline.
"Mike" == Mike Smith mikesmit@netapp.com writes:
Mike> If you want 28 disks in your first raid group. I agree with Mike> Aaron's comment about reconstruct speed. Also the larger the Mike> number of disks in a single raid group the higher the odds Mike> of data loss due to duble disk failure. Please be aware that Mike> if you have too few disks in a raid group you could also Mike> negatively affect the performance. Check this white paper on Mike> www.netapp.com Mike> http://www.netapp.com/tech_library/3008.html Mike> According to the "Table 4" in this technical paper 14 disks Mike> per raid group provides optimal performance. --
Keep in mind a few things. That paper measures performance on older filers (F330's and earlier) using SCSI disks and reduced RAM (256 and less) sizes. While the principles are obviously sound, you should want to do some testing in your own environment.
Understood. My response was simply from the soap-box and intended to steer any onlookers towards the basic guidlines.
You should also balance any performance gains against the increased risk of losing data by using larger RAID group sizes. Unless I needed every last ounce of performance or data capacity, I'd lean toward the smaller RAID group sizes. See:
I checked that paper out too before I sent the other. I felt the first paper sent out had a simpler illustration with which to facilitate the message. From the paper which you quoted comes the following quote:
"The RAID group with a single data disk in this configuration can cause serious performance problems. During a consistency point [TR-3001,Hitz1994], data for a given volume is written to just one RAID group in the volume. The RAID groups are selected in round-robin fashion, one per consistency point. (RAID groups which have zero free space are skipped, and if a RAID group becomes full before completion of a consistency point then processing continues with the next available RAID group.) Because of this write allocation policy, the filer will attempt to distribute writes evenly between the 2-disk RAID group and the 14- (or more) disk RAID group. Just as more disks on a filer can provide vastly better performance [TR-3008], more disks per RAID group gives better performance."
Simply pointing out the last line. If that line taken in context with the points made from the last paper that was quoted I believe that any NetApp Owner/Admin may make an informed decision about how many disks should be in their raid groups with some confidence/comfort.
for a discussion of how to compute Mean Time to Data Loss with different size RAID groups.
For our application, we chose to use a RAID group size of 6.
I've seen our F740's push 10K ops/sec with acceptable response times in a heavy-read/random-access intense (web serving) environment. Cache age is < 1, so we could definitly benefit by bumping our F740's to 1GB cache from 512MB. That's with the size 6 RAID group.
Just a question: Have you tested your filers with larger raid groups as the first paper that I sent suggested? 10k ops are very good but has it been confirmed that you have eeked out every possible op/second? (Just a teaser, not a challenge).
I do Understand. You won't get any argument from me about smaller raid groups. I just wanted to speak from the aged TSE perspective. I have experienced situations with NetApp owners/Admins where they did have a double disk failure in a raid group with more than 14 disks. In a few of those situations I firmly believed that, had the basic recommendations been followed regarding the 14 disk raid group, then the double disk failure would have been avoided. Thanks. Having a response to my rather dry previous posting gave me an opportunity to thoroughly re-read the paper that you cited.