Does anyone have ballpark numbers (for any filer model) how long a drive would take to reconstruct on a filer on the default raid reconstruct_speed setting? ie. on an F85:
options raid.reconstruct_speed 4
I'm trying a ballpark compare against a Sun 3310 array with 73Gig drives, where it took almost excatly 5 hours.
I realize it depends on filer load, but if the speed is linear with load and drive size then it is easy to extrapolate.
I couldn't find a table or any other sort of indication on NOW about this...
Obviously you are interested in the worst case scenario where the failed disk is absolutely dead. I don't have numbers for you on that, but in addition to load and drive size, the number of drives in the raid group also matters because you must read a block from every surviving disk drive in the raid group to reconstruct each block on the failed drive. ONTAP uses RAID4, which has a variable raid group size. You can decrease reconstruct times by keeping raid groups small. But you need a parity drive for each raid group, so that leaves you with fewer data drives.
Netapp recently modified their disk reconstruct procedure to copy as much valid data as possible from the failed disk and only reconstruct blocks that cannot be read. Often a disk does not completely fail, so many blocks can be copied from it, which is much faster than reconstructing.
If you are concerned about the window of vulnerability where after one disk fails a second drive failure causes loss of data, then netapp has recently introduced double parity raid groups. You can configure a raid group with two parity drives (instead of the usual single parity drive) and then you can lose any two disks in the raid group without losing any data.
I think that if you have say 12 disk drives and you have the choice of making two single parity raid groups of 6 drives each or one double parity raid group of 12 drives, that you are safer with the one double parity raid group. This is because any two of the 12 drives can fail and all your data is safe. With two raid groups, two drives in one raid group could fail. Then you lose not only the raid group, but the entire volume that contains it. (Volumes may consist of multiple raid groups and if you lose one, you lose the whole volume.)
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support