On 06/07/99 09:34:03 you wrote:
>
>Could someone please explain what happens when you loose a single
>power supply in a shelf?
That depends on whether or not you have another pwoer supply.
If you do, there's no problem.
>Is the filer smart (dumb) enough to halt?
>Do I get 'double disk error' and loose the filesystem?
Both?
Actually, I can't say for sure what recent version of the OS
does, but in past versions it did think that a bunch of disks
had stopped responding, leading to a 'double disk error'
scenario. However, this does not mean you have lost the
filesystem, because you didn't actually *lose* any disk in
this scenario, and the disks aren't marked bad because they
aren't "there" to be marked. There was a bug/enhancement
request filed which suggested that the filer recognize when
a whole shelf fails and provide a more sensible error message.
There may have been a similar change made when multivolume and
multiraidgroup support was added, since the filer will want to
keep running even when it loses a particular volume and not
halt. However, I am assuming from your question that you want
to know what happens when there's only one filesystem and no
failover filer.
>There is a discussion going on here as to what happens in a hardware RAID
>configuration vs a NetApp filer when a power supply fails.
>
>The thought is that if I have only one power supply in a shelf and it
>fails (failing 7 disks) it is like having multiple disks fail and
>therefore you loose the filesystem.
Like I said, in the past it certainly did "think" it had multiple
disks fail, but this didn't make you lose the filesystem. If you
tried to bring the filer back up sans 6 or 7 disks (and only one
filesystem), it will fail to boot. Once you replace the power
supply and turn it back on with all the disks attached, it will
detect they are all there and pick up just fine. (You also have
the option of just moving the disks to another shelf if you have
empty slots, but I'm not sure if this works with fiber channel
and/or when you have multiple raid groups.)
There are a few possible problems that can still result. Data
will not be lost, since it's all still in NVRAM, but this type
of failure could result in your parity disk for the raid group
being reconstructed, degrading performance for a while. This
also sounds like a case where it's possible the disk labels can
get confused and you'll wind up with disks on the failed shelf
not matching the other disks, and thus the filer refusing to
recognize them even after you replace the power supply. I am
not saying this will happen (it shouldn't), but the exact
circumstances when the labels can get out of sync is unclear
to me. Anyway, this is also easily solved via a floppy command
and won't result in any data loss. Finally, the loss of a
power supply could potentially damage a disk and/or its
contents, giving you a failed disk when you finally get back
up or even data corruption that eventually necessitates doing
a wack on the filesystem. However, these would be rare,
extreme cases, generally the result of an external event
killing the power supply (like a lightning strike or power
surge) rather than the power supply failing on its own. But
this type of problem isn't unique to Netapp; it could happen
to any RAID system.
The bottom line is that the vast majority of the time, once
you replace the power supply the system will come back up
just fine, and those times when there is a problem, the
vast majority of them will be easily corrected without data
loss.
Bruce