On 06/25/99 12:50:17 you wrote:
I would have to say yes, it's *possible*, but the controller would have to fail in a very odd way; not simply not responding the some requests (that would be caught), but executing some and not others (but claiming it did) or misordering commands or something like that. I would think this kind of error would be very rare. Perhaps the controller you had is particularly prone to those sorts of errors; one advantage of Netapp is you're using controllers they themselves have partially designed and tested.
Bruce
sirbruce@ix.netcom.com wrote:
I'll give you odd. Two days after it was put into service, our F740 crashed, with a "WAFL hung" message. Later that morning, it threw a disk and started to reconstruct it on another disk. When ever the reconstruction reach exactly 51%, the system would freeze and crash, again due to "WAFL hung". This happened repeatedly, despite our and NetApp's best efforts. After about 10 hours or so of this, NetApp said the cores we were sending them were complaining about too many problems with too many disks for it to be a real disk problem and they decided it wasn't the disks but the FC controller. We swapped the system board, held our breath as it reached - and passed - 51%. After a total of 12 hours down, we were back in business and the filer has been up ever since (10.5 days as I write this). We've been very happy with the filer since the incident, but I would say that that failure qualifies as "odd", wouldn't you?
-ste