Re: raid failure

3 May 2000


      ----- Original Message -----
From: "Aiello, Tony" Tony.Aiello@netapp.com
To: "'Robert L. Millner'" rmillner@transmeta.com; "toasters"
toasters@mathworks.com
Sent: Wednesday, May 03, 2000 11:31 AM
Subject: RE: raid failure
...
Hello,
I don't see the reference to the version of OnTap used but perhaps I can
relate some information.
GD didn't say exactly what the error message was from prior
problem. either.
Possibly what happened was that it was a drive error that WAS
recoverable.  Netapp will log when it has trouble talking to a
drive but won't fail it so long as it eventually succeeds.  It
would be wrong to fail a drive simply because it temporarily
took long to respond.
Also, I believe in the past if there was a read error, the block
would not be reassigned but the block would be rewritten
using the parity information.  However, in rare cases you could
have a "weak" block where writes appeared to succeed at
first but subsequent reads would eventually fail.
In any case, I don't think it is necessarily Netapp's fault for not
failing the drive.  Transiet disk errors can occur, and you can
only program so many heuristics into the Netapp OS.  It is
entirely possible for such an event to happen and the customer
not have another disk failure or for the problem not to resurface
in reconstruction.  But a previous poster asked what they could
do to minimize the prospects even further... to do that, it means
you fail the drive as soon as anything looks like it might be wrong
with it.  The result, of course, is you spend more money on
drives.
Bruce

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: raid failure