This is a feature of ONTAP 6.5.1+ It is called Rapid RAID Recovery (aka - sick disk copy)
Read the release notes on the NOW site.
-tom
-----Original Message----- From: Mark Simmons [mailto:mds@gbnet.net] Sent: Tuesday, June 01, 2004 10:06 AM To: Busick, Chris Cc: Chris Thompson; toasters@mathworks.com; jbrigman@nc.rr.com Subject: Re: raid drive reconstruction time
Busick, Chris wrote:
Here is a whitepaper link on Diagonal Parity if you haven't already found it.
Ah, but <jedi> this is not the feature we're looking for </jedi> it's how DOT handles copying data off failing disks that's of interest today.
The double parity stuff was done to death a few weeks back istr.
--Chris Busick
-----Original Message----- From: Chris Thompson [mailto:cet1@cus.cam.ac.uk] Sent: Tuesday, June 01, 2004 6:52 AM To: toasters@mathworks.com Cc: mds@gbnet.net; jbrigman@nc.rr.com Subject: Re: raid drive reconstruction time
Mark Simmons mds@gbnet.net writes;
James Brigman wrote:
Steve;
Netapp recently modified their disk reconstruct procedure to copy as much valid data as possible from the failed disk and only reconstruct
blocks that cannot be read. Often a disk does not completely fail, so many blocks can be copied from it, which is much faster than reconstructing.
Can you please point us to a whitepaper on this?
[...]
I have to say that, if a disk is known to be failing, I'm not sure I'd
want to be trusting the data one would copy from it...
Well, there is the "horizontal" data validation provided by the zone or
block checksums to deal with that.
Also, if it's
failing in a way that makes it take a lot of time to serve a block of data, does DOT adjust its strategy accordingly and work out at some point that it should just started recreating the data from the other
disks?
That seems like a good question, and I would echo James' request for a whitepaper (or other form of technical detail).
Is the same procedure used when a disk is failed by operator command as
well as when ONTAP decides to fail it on its own initiative? The case of failing a disc because its reported error rate is too high for comfort would seem to be one of the most likely scenarios when "many blocks can be copied from it".
Also, if ONTAP is rebooted during the reconstruction, is the "half-failed" status of the disc preserved?
Chris Thompson Email: cet1@cam.ac.uk
Yarmas, Tom wrote:
This is a feature of ONTAP 6.5.1+ It is called Rapid RAID Recovery (aka
- sick disk copy)
Read the release notes on the NOW site.
Two things: (1) I can't login to NOW for some reason. (2) Are the questions already asked in this thread covered in the release notes? I didn't think they were.
-tom
-----Original Message----- From: Mark Simmons [mailto:mds@gbnet.net] Sent: Tuesday, June 01, 2004 10:06 AM To: Busick, Chris Cc: Chris Thompson; toasters@mathworks.com; jbrigman@nc.rr.com Subject: Re: raid drive reconstruction time
Busick, Chris wrote:
Here is a whitepaper link on Diagonal Parity if you haven't already found it.
Ah, but <jedi> this is not the feature we're looking for </jedi> it's how DOT handles copying data off failing disks that's of interest today.
The double parity stuff was done to death a few weeks back istr.
--Chris Busick
-----Original Message----- From: Chris Thompson [mailto:cet1@cus.cam.ac.uk] Sent: Tuesday, June 01, 2004 6:52 AM To: toasters@mathworks.com Cc: mds@gbnet.net; jbrigman@nc.rr.com Subject: Re: raid drive reconstruction time
Mark Simmons mds@gbnet.net writes;
James Brigman wrote:
Steve;
Netapp recently modified their disk reconstruct procedure to copy as much valid data as possible from the failed disk and only reconstruct
blocks that cannot be read. Often a disk does not completely fail, so many blocks can be copied from it, which is much faster than reconstructing.
Can you please point us to a whitepaper on this?
[...]
I have to say that, if a disk is known to be failing, I'm not sure I'd
want to be trusting the data one would copy from it...
Well, there is the "horizontal" data validation provided by the zone or
block checksums to deal with that.
Also, if it's
failing in a way that makes it take a lot of time to serve a block of data, does DOT adjust its strategy accordingly and work out at some point that it should just started recreating the data from the other
disks?
That seems like a good question, and I would echo James' request for a whitepaper (or other form of technical detail).
Is the same procedure used when a disk is failed by operator command as
well as when ONTAP decides to fail it on its own initiative? The case of failing a disc because its reported error rate is too high for comfort would seem to be one of the most likely scenarios when "many blocks can be copied from it".
Also, if ONTAP is rebooted during the reconstruction, is the "half-failed" status of the disc preserved?
Chris Thompson Email: cet1@cam.ac.uk