Does anyone have ballpark numbers (for any filer model) how long a drive would take to reconstruct on a filer on the default raid reconstruct_speed setting? ie. on an F85:
options raid.reconstruct_speed 4
I'm trying a ballpark compare against a Sun 3310 array with 73Gig drives, where it took almost excatly 5 hours.
I realize it depends on filer load, but if the speed is linear with load and drive size then it is easy to extrapolate.
I couldn't find a table or any other sort of indication on NOW about this...
Adi
Does anyone have ballpark numbers (for any filer model) how long a drive would take to reconstruct on a filer on the default raid reconstruct_speed setting? ie. on an F85:
options raid.reconstruct_speed 4
I'm trying a ballpark compare against a Sun 3310 array with 73Gig drives, where it took almost excatly 5 hours.
I realize it depends on filer load, but if the speed is linear with load and drive size then it is easy to extrapolate.
I couldn't find a table or any other sort of indication on NOW about this...
Obviously you are interested in the worst case scenario where the failed disk is absolutely dead. I don't have numbers for you on that, but in addition to load and drive size, the number of drives in the raid group also matters because you must read a block from every surviving disk drive in the raid group to reconstruct each block on the failed drive. ONTAP uses RAID4, which has a variable raid group size. You can decrease reconstruct times by keeping raid groups small. But you need a parity drive for each raid group, so that leaves you with fewer data drives.
Netapp recently modified their disk reconstruct procedure to copy as much valid data as possible from the failed disk and only reconstruct blocks that cannot be read. Often a disk does not completely fail, so many blocks can be copied from it, which is much faster than reconstructing.
If you are concerned about the window of vulnerability where after one disk fails a second drive failure causes loss of data, then netapp has recently introduced double parity raid groups. You can configure a raid group with two parity drives (instead of the usual single parity drive) and then you can lose any two disks in the raid group without losing any data.
I think that if you have say 12 disk drives and you have the choice of making two single parity raid groups of 6 drives each or one double parity raid group of 12 drives, that you are safer with the one double parity raid group. This is because any two of the 12 drives can fail and all your data is safe. With two raid groups, two drives in one raid group could fail. Then you lose not only the raid group, but the entire volume that contains it. (Volumes may consist of multiple raid groups and if you lose one, you lose the whole volume.)
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support
Steve;
***************** Netapp recently modified their disk reconstruct procedure to copy as much valid data as possible from the failed disk and only reconstruct blocks that cannot be read. Often a disk does not completely fail, so many blocks can be copied from it, which is much faster than reconstructing. *****************
Can you please point us to a whitepaper on this?
JKB
James Brigman wrote:
Steve;
Netapp recently modified their disk reconstruct procedure to copy as much valid data as possible from the failed disk and only reconstruct blocks that cannot be read. Often a disk does not completely fail, so many blocks can be copied from it, which is much faster than reconstructing.
Can you please point us to a whitepaper on this?
JKB
I have to say that, if a disk is known to be failing, I'm not sure I'd want to be trusting the data one would copy from it... Also, if it's failing in a way that makes it take a lot of time to serve a block of data, does DOT adjust its strategy accordingly and work out at some point that it should just started recreating the data from the other disks?
Mark Simmons mds@gbnet.net writes;
James Brigman wrote:
Steve;
Netapp recently modified their disk reconstruct procedure to copy as much valid data as possible from the failed disk and only reconstruct blocks that cannot be read. Often a disk does not completely fail, so many blocks can be copied from it, which is much faster than reconstructing.
Can you please point us to a whitepaper on this?
[...]
I have to say that, if a disk is known to be failing, I'm not sure I'd want to be trusting the data one would copy from it...
Well, there is the "horizontal" data validation provided by the zone or block checksums to deal with that.
Also, if it's
failing in a way that makes it take a lot of time to serve a block of data, does DOT adjust its strategy accordingly and work out at some point that it should just started recreating the data from the other disks?
That seems like a good question, and I would echo James' request for a whitepaper (or other form of technical detail).
Is the same procedure used when a disk is failed by operator command as well as when ONTAP decides to fail it on its own initiative? The case of failing a disc because its reported error rate is too high for comfort would seem to be one of the most likely scenarios when "many blocks can be copied from it".
Also, if ONTAP is rebooted during the reconstruction, is the "half-failed" status of the disc preserved?
Chris Thompson Email: cet1@cam.ac.uk
Sorry folks, but I don't have a white paper reference. I got my information from a presentation by our Netapp representative. A big focus of his talk was the Near Store appliances, which have large numbers of large capacity (but less reliable) disks.
Netapp realized that disk failures would happen more often on Near Stores and that reconstruction times on a 250G drive would be very long, making the risk of a double drive failure unacceptable. That is why they recommend double parity raid groups on Near Stores and why they try to speed up reconstruction by using whatever data can be taken from the failed disk.
There is an excellent white paper TR3298 that explains how double parity works.
Mark Simmons mds@gbnet.net writes;
James Brigman wrote:
Steve;
Netapp recently modified their disk reconstruct procedure to copy as much valid data as possible from the failed disk and only reconstruct blocks that cannot be read. Often a disk does not completely fail, so many blocks can be copied from it, which is much faster than reconstructing.
Can you please point us to a whitepaper on this?
[...]
I have to say that, if a disk is known to be failing, I'm not sure I'd want to be trusting the data one would copy from it...
Well, there is the "horizontal" data validation provided by the zone or block checksums to deal with that.
Also, if it's
failing in a way that makes it take a lot of time to serve a block of data, does DOT adjust its strategy accordingly and work out at some point that it should just started recreating the data from the other disks?
That seems like a good question, and I would echo James' request for a whitepaper (or other form of technical detail).
Is the same procedure used when a disk is failed by operator command as well as when ONTAP decides to fail it on its own initiative? The case of failing a disc because its reported error rate is too high for comfort would seem to be one of the most likely scenarios when "many blocks can be copied from it".
Also, if ONTAP is rebooted during the reconstruction, is the "half-failed" status of the disc preserved?
Chris Thompson Email: cet1@cam.ac.uk
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support
Aditya;
**** Does anyone have ballpark numbers (for any filer model) how long a drive would take to reconstruct on a filer on the default raid reconstruct_speed setting? ie. on an F85:
options raid.reconstruct_speed 4
I'm trying a ballpark compare against a Sun 3310 array with 73Gig drives, where it took almost excatly 5 hours.
I realize it depends on filer load, but if the speed is linear with load and drive size then it is easy to extrapolate.
I couldn't find a table or any other sort of indication on NOW about this... **** I'm trying to benchmark the same thing (Restore time) on an R200 to figure out how to weight things like double-parity against restore time.
I've found 6 hours for the approx. 250GB drives on an R200, dual-spindle volume. That's about worst case I can come up with, and that's with no external reads from the volume at all. 72GB drives are going to be much faster, and I'll bet, linear in time with disk size, for a given disk rotational speed. Remember, too, that the R200's drives are 320GB parallel ATA drives: NOT the blazing-fast U160 SCSI drives in the 3310.
I haven't concocted a "read while constructing" test: it's extremely difficult and time consuming. But I will tell you that I do believe that the speed of rebuild is NOT linear with load and drive size. For you to make any headway, you need to use identical drives in both the 3310 and the filer, and you need to use the same number of spindles per volume (filer) or LUN (3310), same RAID technology too.
What you're going to find is that no-read rebuild speeds on the NetApp will be faster, because WAFL kicks butt and the parity recalculations happen in the filer head's RAM at RAM speeds. It's the disk technology that will be the brick wall in any disk storage device. Disk operations take milliseconds, while RAM operations take nanoseconds.
But with any level of reads at all against either device, the 3310 is going to be a tad faster. But...at best...you can connect a 3310 to two devices without a SAN switch. The NetApp can talk to anything on your network that sees CIFS or NFS.
I can tell you that I've lost drives during my backup window, and seen the backups get delayed by 30-50%. That is, backups taking upwards of twice as long. My raid "reconstruct speed" is set to the default, "4". I'm sorry, I can't tell you if it took longer or not to rebuild the volume.
One other thing: that Sun 3310 is a SAN-technology unit that connects to the host via FC. To get a fair comparison between a NetApp and a 3310 is going to require you to gather metrics with no host reads from the rebuilding volume, (or LUN in the case of the 3310) because the NetApp is going to be strongly bound by the speed of your connecting network. It'll be an extremely unfair test if your NetApp client is reading over a 100mb connection vs. the Sun 1- or 2-gbyte FCAL connection. (The technology is different enough that you could appear to be trolling for ammo for the old "SAN vs. NAS" war with your question.)
JKB