On 05/14/99 23:13:56 you wrote:
On Fri, May 14, 1999 at 10:28:43PM +0000, Richard L. Rhodes wrote:
Our EMC sales rep sent this to me. Any thoughts or comments, on what might have happened?
Sure. I know exactly what happened.
Your EMC rep peed his pants when he saw a competitor having trouble in public, and made sure you were aware of it. Pretty typical of EMC reps, AFAICT.
They also went and posted it on the NTAP Yahoo message board.
As to what happened, the most *likely* scenario for any signficant downtime of the Netapp is double disk failure. That is, one disk failed, and during reconstruction they lost another.
However, the description of events (moving mailboxes) may or may not support this. It's hard to rely on such technical details in a media report. It's not clear whether or not data was actually lost.
It's also possible that the had a crash due to a bug and this left the filesystem is an inconsitent state and/or a state that caused repeated crashes. In both this and the double-disk failure case, they would have to run wack to make sure the filesystem was okay before returning it to service. This sounds like the "cleaning" they are talking about.
If they were running a cluster, most likely there would have been no interruption in service. You get what you pay for. I'm sure their Apache (or whatever) web servers have crashed more than their filers have.
There doesn't seem to be much of a story here.
Bruce
On Sat, 15 May 1999 sirbruce@ix.netcom.com wrote:
As to what happened, the most *likely* scenario for any signficant downtime of the Netapp is double disk failure. That is, one disk failed, and during reconstruction they lost another.
Disk shelf failure (which is really like a multi-drive failure) or NVRAM failure are also possibilities. I've now had two instances of a filer's NVRAM board going south in two years (the latest happening last week), but no multi-drive failures yet. Clustering will protect against that, but the drives are still vulnerable. Any hope yet of having RAID 1 mirroring soon? Async mirroring is nice, but I'd rather have all writes going to pairs of FC-AL disk shelves (each connected to a pair of filers too, of course).
If they were running a cluster, most likely there would have been no interruption in service. You get what you pay for. I'm sure their Apache (or whatever) web servers have crashed more than their filers have.
To be fair, Apache is pretty damn solid, and it is much easier and cheaper to reduce web server MTTR to nearly zero than with a filer.
On Sat, 15 May 1999, Brian Tao wrote:
Async mirroring is nice, but I'd rather have all writes going to pairs of FC-AL disk shelves (each connected to a pair of filers too, of course).
Taking this idea a bit further, how about NVRAM mirroring. If you have a cluster that is already done on the other filer, but if you don't and the NVRAM goes to Rome there is a bit of exposure. Putting in a second NVRAM card should be considerably cheper than adding another head, although not as resiliant against DOS.
Tom