Re: Raid group sizing in DOT 7G

4 Jan 2006


      ...
The NOW website recommends a raid group size of 12 when using
dual parity on a R100 with 4 shelves of 12 disks each, presumably
for performance reasons.  However, the minimum number of disks per
shelf within a group size of 12 is three, so that even with dual
parity, if you lose one shelf, you lose the entire raid group.
In fact, if you build 4 groups of size 12 (well, OK, one of these
would have 10, if you wanted to save aside two hot spares), then
losing one shelf would mean you would lose all 4 plexes!
We've had to replace a shelf on the same R100 twice in the last
two years.  Thankfully, we did so prior to a complete failure.
My point is that such a failure is not necessarily rare.
To guard against a shelf failure, it seems prudent to use a raid group 
size of 8, with DP, and layout the disks so that no more than two disks 
of any one raid group are on the same shelf.
As soon as you have a single disk failure, which is much more likely
than a shelf failure, the filer automatically reconstructs on a hot spare.  
This will probably break your raid group layout and I can't think of an
easy way to put it back.  You could physically rearrange your disks, but
that requires downtime.  Seems like a lot of effort for little gain.
...
But I'm hesitant to go against NetApp's recommendation, and concerned
the performance hit will be too big.  Currently, we're only serving
home dirs via CIFS and NFS, with CPU usage floating between 25 and 50%
most of the time.  But we're considering configuring a LUN for usage by
an exchange server.  Plus, I'd like to move to 7G (currently at 6.4.5),
and use a large aggregate, with flexvols, which will be yet another
performance hit (due to the extra layer of software).  Our R100 isn't
disk bound right now, so I don't anticipate any performance wins from
the extra spindles/volume.  The raid group size of 8 uses 4 disks less
for data (in a maximum disk usage configuration, with 2 hot spares)
than the layout with rg size of 12, which is palatable for our
situation.  And I like the idea of flexible sized volumes.
Comments?
I don't think that aggregates and flexvols are much of a performance hit.
An aggregate behaves much like a traditional volume.  Flexvols do add
another layer, but I think that layer is negligible.
Larger raid groups are more efficent because on average larger raid groups
require fewer parity operations (and parity calculations require CPU).
Aggregates and flexvols reduce parity operations even further.  Writing
to two different traditional volumes requires writing to at least two
different raid stripes, which means at least two parity updates.  Writes
to two different flexvols in the same aggregate can be done on the same
raid stripe, with only one parity update.
Larger raid groups are also more efficient in terms of disk storage
because you have more data disks per parity disk.  This is even more
important when using double parity.
A downside of larger raid groups is increased reconstruct time after
a disk failure.  With single parity, a second disk failure during the
reconstruct leads to data loss.  Double parity prevents this.
Steve Losen   scl@virginia.edu    phone: 434-924-0640
University of Virginia               ITC Unix Support

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Raid group sizing in DOT 7G