On 11/07/97 18:43:45 you wrote:
On Tue, 4 Nov 1997 sirbruce@ix.netcom.com wrote:
- Filer crashes while running - Reboot
- Filer crashes replaying NVRAM - Reboot
- Filer crahses again while replaying NVRAM - Reboot
- Filer realizes it's failed replaying NVRAM twice in a row, so it flags it as bad, dumps the NVRAM, and - Reboot
After replacing all the disks with new ones, the same crash popped up again on that filer, after two more days of heavy reads and writes. I rebooted it ten times in a row... kernel panic every time on "../driver/disk/disk.c:2633: Assertion failure" right after the "Recomputing parity in NVRAM" message (I assume that means it never gets around to replaying the WAFL logs?).
Did it ever mention dumping the NVRAM? Or is it always crashing before it ever replays NVRAM? I think I understand what you're saying above but you didn't mention any other error messages from the other reboots so I wanted to be sure.
Call Netapp tech support, was told an on-call engineer would call back... haven't heard anything back yet. *sigh*
It sounds like the disks themselves have become corrupted in such a way that it's dying on initialization... they will probably have you boot off floppy to wack the filesystem, and/or dump the NVRAM manually.
The fact that it continues to be the same machine leads me to think that swapping out the NVRAM card and/or the SCSI cards may be in order.
But, I'm no expert support person and I don't have access to the code, so Netapp may have already figured out what this bug is and the solution is totally different. :)
Bruce
On Fri, 7 Nov 1997 sirbruce@ix.netcom.com wrote:
Did it ever mention dumping the NVRAM? Or is it always crashing before it ever replays NVRAM? I think I understand what you're saying above but you didn't mention any other error messages from the other reboots so I wanted to be sure.
No, the last normal boot message I see is "Recomputing parity in NVRAM", but the failure is in disk.c (perhaps right when it begins writing data to disk, but before the next boot message is printed?):
[... other boot messages deleted...] Loading filesystem. Recomputing parity in NVRAM
PANIC: ../driver/disk/disk.c:2633: Assertion failure.
version: NetApp Release 4.2a: Fri Sep 5 09:36:36 PDT 1997 cc flags: 3 dumping core: .......... Old core present on disk --- not dumped. Program terminated ok
The exact same sequence is replayed on every reboot.
The fact that it continues to be the same machine leads me to think that swapping out the NVRAM card and/or the SCSI cards may be in order.
I managed to capture the original kernel panic only once: "PANIC: ../common/wafl/nvlog.c: 1088: Assertion failure", although your guess is as good as mine why all subsequent reboots panic in disk.c. The suggestions I've received from Netapp support all involve getting a core dump over to them for analysis. Unfortunately, the filer never gets up to the point where it can do a savecore. :(