On Tue, 4 Nov 1997 sirbruce@ix.netcom.com wrote:
You are correct that it's probably a software bug, although running the NVRAM diagnostics (and perhaps re-seating the card) is certainly something you should try.
The diagnostics don't indicate a problem with the NVRAM subsystem or any other component (after running in a loop for a few hours). I don't think it is purely a software bug though, given that I have three other identical filers that have yet to crash during testing, while the problem filer crashed four times. I'll try running the tests again tonight, and if the panic still occurs, I'll swap the NVRAM (board and all) with one of our spares.
[... other boot messages deleted...] Loading filesystem. Recomputing parity in NVRAM
PANIC: ../driver/disk/disk.c:2633: Assertion failure.
version: NetApp Release 4.2a: Fri Sep 5 09:36:36 PDT 1997 cc flags: 3 dumping core: .......... Old core present on disk --- not dumped. Program terminated ok
At this point the filer is inaccessible, and I can't find a way to get it up and running.
Why was it "innaccessible"? Just reboot it again.
And again it panics. It never gets up to the point where I can login to it via console or network. Powering off doesn't solve the problem, so I suspect something bogus in the NVRAM is triggering the software fault.
Is there a way to flush the NVRAM or ignore an existing dump... some way to turn NFS back on so the data can be retrieved.
Yes... just keep rebooting and evetually it will throw away the NVRAM.
Hrm, that doesn't sound like a very reliable way of doing it. ;-) Apparently there is a hidden command from floppy boot that lets you zap the NVRAM (you then have to run wacky, but at least you're back up and running).
When NVRAM is corrupt, you have to keep rebooting several times. The sequence is usually like this.
- Filer crashes while running - Reboot
- Filer crashes replaying NVRAM - Reboot
- Filer crahses again while replaying NVRAM - Reboot
- Filer realizes it's failed replaying NVRAM twice in a row, so it flags it as bad, dumps the NVRAM, and - Reboot
- Filer comes back up, probably in degraded mode, and is thus reconstructing. If there has been filesystem damage it may crash here again, and reboot again. If you still can't get it up (it may say "Filesystem may be scrambled) or you can't get it up for any length of time, you should call Netapp support and have them help you with the procedure for fixing the filesystem (wack) from floppy.
The real kickers in this are you have to "know" that it'll do 2 and 3, and won't just keep rebooting forever.
I'll give this a try if the problem persists. I recall floppy booting the filer twice after the initial crash and failed auto- reboot, but not a third time.
The only way around I've found is to wipe out the filesystem and start over again (obviously not the optimal solution). Ideas?
The above should help.
Excellent, thanks.