On 11/08/97 01:39:24 you wrote:
>> The fact that it continues to be the same machine leads me to think
>> that swapping out the NVRAM card and/or the SCSI cards may be in
>> order.
>
> I managed to capture the original kernel panic only once: "PANIC:
>../common/wafl/nvlog.c: 1088: Assertion failure", although your guess
>is as good as mine why all subsequent reboots panic in disk.c. The
>suggestions I've received from Netapp support all involve getting a
>core dump over to them for analysis. Unfortunately, the filer never
>gets up to the point where it can do a savecore. :(
Well, if the NVRAM error caused something bogus to be written to disk,
creating bad metadata in the unclean shutdown, then this could trigger
another bug in the disk code where it reads the disks and recomputes
parity. Or the NVRAM itself may actually have caused the error. It
could even be that the disk.c message is misleading and is really
talking to the NVRAM at that point. Alternatively, the bad data on
the disk could have been caused by bad SCSI, but I admit this is less
likely. Frankly there's no way to know without seeing the code (even
the best modular code can have subtle dependencies and side effects
that make it appear the error is something other than where it is).
I'm surprised support did not understand that your filer was actually
DOWN, and thus you couldn't get the core. There are ways to boot from
floppy to dump the NVRAM and fix the filesystem; I would press them for
advice in that direction if they didn't understand your problem fully.
In any case, like I said, if it consistently is happening to this
machine you could replace the NVRAM card, re-install it all over again,
and see if it happens again.
Bruce