The NOW website recommends a raid group size of 12 when using dual parity on a R100 with 4 shelves of 12 disks each, presumably for performance reasons. However, the minimum number of disks per shelf within a group size of 12 is three, so that even with dual parity, if you lose one shelf, you lose the entire raid group. In fact, if you build 4 groups of size 12 (well, OK, one of these would have 10, if you wanted to save aside two hot spares), then losing one shelf would mean you would lose all 4 plexes!
We've had to replace a shelf on the same R100 twice in the last two years. Thankfully, we did so prior to a complete failure. My point is that such a failure is not necessarily rare.
To guard against a shelf failure, it seems prudent to use a raid group size of 8, with DP, and layout the disks so that no more than two disks of any one raid group are on the same shelf.
As soon as you have a single disk failure, which is much more likely than a shelf failure, the filer automatically reconstructs on a hot spare. This will probably break your raid group layout and I can't think of an easy way to put it back. You could physically rearrange your disks, but that requires downtime. Seems like a lot of effort for little gain.
But I'm hesitant to go against NetApp's recommendation, and concerned the performance hit will be too big. Currently, we're only serving home dirs via CIFS and NFS, with CPU usage floating between 25 and 50% most of the time. But we're considering configuring a LUN for usage by an exchange server. Plus, I'd like to move to 7G (currently at 6.4.5), and use a large aggregate, with flexvols, which will be yet another performance hit (due to the extra layer of software). Our R100 isn't disk bound right now, so I don't anticipate any performance wins from the extra spindles/volume. The raid group size of 8 uses 4 disks less for data (in a maximum disk usage configuration, with 2 hot spares) than the layout with rg size of 12, which is palatable for our situation. And I like the idea of flexible sized volumes.
Comments?
I don't think that aggregates and flexvols are much of a performance hit. An aggregate behaves much like a traditional volume. Flexvols do add another layer, but I think that layer is negligible.
Larger raid groups are more efficent because on average larger raid groups require fewer parity operations (and parity calculations require CPU). Aggregates and flexvols reduce parity operations even further. Writing to two different traditional volumes requires writing to at least two different raid stripes, which means at least two parity updates. Writes to two different flexvols in the same aggregate can be done on the same raid stripe, with only one parity update.
Larger raid groups are also more efficient in terms of disk storage because you have more data disks per parity disk. This is even more important when using double parity.
A downside of larger raid groups is increased reconstruct time after a disk failure. With single parity, a second disk failure during the reconstruct leads to data loss. Double parity prevents this.
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support