Could someone please explain what happens when you loose a single power supply in a shelf?
Is the filer smart (dumb) enough to halt? Do I get 'double disk error' and loose the filesystem?
There is a discussion going on here as to what happens in a hardware RAID configuration vs a NetApp filer when a power supply fails.
The thought is that if I have only one power supply in a shelf and it fails (failing 7 disks) it is like having multiple disks fail and therefore you loose the filesystem.
Thanks, George
------------------------------------------------------------------------------- George Kahler e-mail: george@yorku.ca UNIX Systems Administrator humans: (416) 736-5257 x.22699 Computing Services, York University machines: (416) 736-5830 Ontario, Canada, M3J-1P3
In the immortal words of George Kahler (george@YorkU.CA):
Could someone please explain what happens when you loose a single power supply in a shelf?
Is the filer smart (dumb) enough to halt? Do I get 'double disk error' and loose the filesystem?
There is a discussion going on here as to what happens in a hardware RAID configuration vs a NetApp filer when a power supply fails.
The thought is that if I have only one power supply in a shelf and it fails (failing 7 disks) it is like having multiple disks fail and therefore you loose the filesystem.
On a filer with dual shelf power supplies, it registers a shelf fault, screams loudly, and moves on.
I've never used a filer with single shelf power supplies, but I have, uh, accidentally powered off a shelf on a running F630 once. (D'oh!) The filer pretty much immediately halted, but came back up with not too much difficulty.
-n
------------------------------------------------------memory@blank.org I've got more than one membership / to more than one club and I owe my life / to the people that I love. (--Ani DiFranco) http://www.blank.org/memory/------------------------------------------
Could someone please explain what happens when you loose a single power supply in a shelf?
Is the filer smart (dumb) enough to halt? Do I get 'double disk error' and loose the filesystem?
The filer does the "right thing:"
If there's a double disk failure in a single RAID group, you have definitely lost the file system, at least for now, and the filer tells you so. However, it immediately stops writing to the filed file system, so that it doesn't screw things up in case you do find those failed disks.
So in the case you are talking about, the filesystem will appear to fail, but if you replace the failed power supply, everything should be okay again. (I don't know whether or not this sequence causes/requires a reboot. My guess would be that it does.)
Dave
On Mon, 7 Jun 1999, George Kahler wrote:
Could someone please explain what happens when you loose a single power supply in a shelf?
This depends on how many power supplies you have. If you have only one and it goes out, well you know what happens. If you have two the second one will continue to power the shelf with no problems.
Do I get 'double disk error' and loose the filesystem?
I think this depends on how you set up your raid sets. If you set them up per shelf then if you loose a whole shelf you should be able to recover quite gracefully after restoring power to the shelf. Otherwise it may not be so nice.
The thought is that if I have only one power supply in a shelf and it fails (failing 7 disks) it is like having multiple disks fail and therefore you loose the filesystem.
That's possible, but since the filer would crap out immediately after loosing the shelf it may be very easy to recover from it. Remember that the data should not be corrupted by any hard drive failures, so in principle it should be pretty intact after you turn power back on.
Tom
Could someone please explain what happens when you loose a single power supply in a shelf?
By a strange coincidence, this is exactly what happened to one of our 330's running 5.1.2 a week ago. The filer went down, we powered if off, switched out the power supply, brought it back up, and all was well. The startup messages about replaying the NVRAM and the WAFL log were displayed (insert warm, fuzzy here). No data loss, just 30 minutes of down time (it's in another building.)
Below are the related entries in the etc/messages file (I've inserted a couple of line breaks for readability...): Hope this helps!