Re: Disk Controller Failures

27 Jun 1999


      sirbruce@ix.netcom.com wrote:
...
...
someone said:
...We recently encountered a disk controller failure on one of our
data servers (not on our Netapp). The problem was that this failure was
not a complete failure of the drive itself, but rather the controller
started to die slowly. ...
      ...should this occur on a Netapp ... would it cause the entire
filesystem to go corrupt and cause partial or complete data loss as in
this case? ...
I would have to say yes, it's *possible*, but the controller would have
to fail in a very odd way; ...
I'll give you odd. Two days after it was put into service, our F740
crashed, with a "WAFL hung" message. Later that morning, it threw a disk
and started to reconstruct it on another disk. When ever the
reconstruction reach exactly 51%, the system would freeze and crash,
again due to "WAFL hung". This happened repeatedly, despite our and
NetApp's best efforts. After about 10 hours or so of this, NetApp said
the cores we were sending them were complaining about too many problems
with too many disks for it to be a real disk problem and they decided it
wasn't the disks but the FC controller. We swapped the system board,
held our breath as it reached - and passed - 51%. After a total of 12
hours down, we were back in business and the filer has been up ever
since (10.5 days as I write this). We've been very happy with the filer
since the incident, but I would say that that failure qualifies as
"odd", wouldn't you?
-ste

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Disk Controller Failures