On Sat, 15 May 1999 sirbruce@ix.netcom.com wrote:
As to what happened, the most *likely* scenario for any signficant downtime of the Netapp is double disk failure. That is, one disk failed, and during reconstruction they lost another.
Disk shelf failure (which is really like a multi-drive failure) or NVRAM failure are also possibilities. I've now had two instances of a filer's NVRAM board going south in two years (the latest happening last week), but no multi-drive failures yet. Clustering will protect against that, but the drives are still vulnerable. Any hope yet of having RAID 1 mirroring soon? Async mirroring is nice, but I'd rather have all writes going to pairs of FC-AL disk shelves (each connected to a pair of filers too, of course).
If they were running a cluster, most likely there would have been no interruption in service. You get what you pay for. I'm sure their Apache (or whatever) web servers have crashed more than their filers have.
To be fair, Apache is pretty damn solid, and it is much easier and cheaper to reduce web server MTTR to nearly zero than with a filer.