On 06/07/99 09:34:03 you wrote:
Could someone please explain what happens when you loose a single power supply in a shelf?
That depends on whether or not you have another pwoer supply. If you do, there's no problem.
Is the filer smart (dumb) enough to halt? Do I get 'double disk error' and loose the filesystem?
Both?
Actually, I can't say for sure what recent version of the OS does, but in past versions it did think that a bunch of disks had stopped responding, leading to a 'double disk error' scenario. However, this does not mean you have lost the filesystem, because you didn't actually *lose* any disk in this scenario, and the disks aren't marked bad because they aren't "there" to be marked. There was a bug/enhancement request filed which suggested that the filer recognize when a whole shelf fails and provide a more sensible error message.
There may have been a similar change made when multivolume and multiraidgroup support was added, since the filer will want to keep running even when it loses a particular volume and not halt. However, I am assuming from your question that you want to know what happens when there's only one filesystem and no failover filer.
There is a discussion going on here as to what happens in a hardware RAID configuration vs a NetApp filer when a power supply fails.
The thought is that if I have only one power supply in a shelf and it fails (failing 7 disks) it is like having multiple disks fail and therefore you loose the filesystem.
Like I said, in the past it certainly did "think" it had multiple disks fail, but this didn't make you lose the filesystem. If you tried to bring the filer back up sans 6 or 7 disks (and only one filesystem), it will fail to boot. Once you replace the power supply and turn it back on with all the disks attached, it will detect they are all there and pick up just fine. (You also have the option of just moving the disks to another shelf if you have empty slots, but I'm not sure if this works with fiber channel and/or when you have multiple raid groups.)
There are a few possible problems that can still result. Data will not be lost, since it's all still in NVRAM, but this type of failure could result in your parity disk for the raid group being reconstructed, degrading performance for a while. This also sounds like a case where it's possible the disk labels can get confused and you'll wind up with disks on the failed shelf not matching the other disks, and thus the filer refusing to recognize them even after you replace the power supply. I am not saying this will happen (it shouldn't), but the exact circumstances when the labels can get out of sync is unclear to me. Anyway, this is also easily solved via a floppy command and won't result in any data loss. Finally, the loss of a power supply could potentially damage a disk and/or its contents, giving you a failed disk when you finally get back up or even data corruption that eventually necessitates doing a wack on the filesystem. However, these would be rare, extreme cases, generally the result of an external event killing the power supply (like a lightning strike or power surge) rather than the power supply failing on its own. But this type of problem isn't unique to Netapp; it could happen to any RAID system.
The bottom line is that the vast majority of the time, once you replace the power supply the system will come back up just fine, and those times when there is a problem, the vast majority of them will be easily corrected without data loss.
Bruce