----- Original Message ----- From: "Aiello, Tony" Tony.Aiello@netapp.com To: "'Robert L. Millner'" rmillner@transmeta.com; "toasters" toasters@mathworks.com Sent: Wednesday, May 03, 2000 11:31 AM Subject: RE: raid failure
Hello,
I don't see the reference to the version of OnTap used but perhaps I can relate some information.
GD didn't say exactly what the error message was from prior problem. either.
Possibly what happened was that it was a drive error that WAS recoverable. Netapp will log when it has trouble talking to a drive but won't fail it so long as it eventually succeeds. It would be wrong to fail a drive simply because it temporarily took long to respond.
Also, I believe in the past if there was a read error, the block would not be reassigned but the block would be rewritten using the parity information. However, in rare cases you could have a "weak" block where writes appeared to succeed at first but subsequent reads would eventually fail.
In any case, I don't think it is necessarily Netapp's fault for not failing the drive. Transiet disk errors can occur, and you can only program so many heuristics into the Netapp OS. It is entirely possible for such an event to happen and the customer not have another disk failure or for the problem not to resurface in reconstruction. But a previous poster asked what they could do to minimize the prospects even further... to do that, it means you fail the drive as soon as anything looks like it might be wrong with it. The result, of course, is you spend more money on drives.
Bruce