During the pre-production testing of one of our F230 filers, we discovered a problem with one of them that we wereonly able to fix by rebuilding the RAID (and thus losing whatever OS and data was on the filer).
The hardware configuration is as follows:
NetApp Release 4.2a: Fri Sep 5 09:36:36 PDT 1997 System ID: 0016783805 slot 0: System Board 90 MHz (NetApp System Board Ia rev-b) Firmware release: 1.6_i Memory Size: 192 MB slot 0: SCSI Host Adapter 0 (QLogic ISP 1040B) Firmware Version 2.26 Clock Rate 40MHz. slot 0: Ethernet Controller e0 MAC Address: 00:a0:98:00:0f:1b (100tx) slot 6: NVRAM (NetApp NVRAM I) Revision: D2 Serial Number: 6589 Memory Size: 4 MB Battery 1 Status: 100% (3.07v) Battery 2 Status: 100% (3.07v) slot 7: Ethernet Controller e7 (Znyx) MAC Address: 00:c0:95:f8:49:c9 (Twisted pair) slot 9: Dual SCSI Host Adapter (NetApp SCSI Adapter II) SCSI Host Adapter 9a (Qlogic ISP 1040B) Firmware Version 2.26 Clock Rate 40MHz. 0: SEAGATE ST15150W 9107 Size=3.9GB (8388315 blocks) 1: SEAGATE ST15150W 9107 Size=3.9GB (8388315 blocks) 2: SEAGATE ST15150W 9107 Size=3.9GB (8388315 blocks) 3: SEAGATE ST15150W 9107 Size=3.9GB (8388315 blocks) 5: SEAGATE ST15150W 9107 Size=3.9GB (8388315 blocks) 4: SEAGATE ST15150W 9107 Size=3.9GB (8388315 blocks) SCSI Host Adapter 9b (Qlogic ISP 1040B) Firmware Version 2.26 Clock Rate 40MHz.
Part of the tests consisted of filling up the filesystem via NFS and NDMP copies from a host Ultrasparc. Three other F230's of identical configuration survived the tests, but the remaining F230 experienced the following panic four times:
PANIC: ../common/wafl/nvlog.c: 1088: Assertion failure
I will be running the NVRAM diagnostics later today to see if they turn up anything. However, more distressing is the behaviour of the Netapp upon reboot:
[... other boot messages deleted...] Loading filesystem. Recomputing parity in NVRAM
PANIC: ../driver/disk/disk.c:2633: Assertion failure.
version: NetApp Release 4.2a: Fri Sep 5 09:36:36 PDT 1997 cc flags: 3 dumping core: .......... Old core present on disk --- not dumped. Program terminated ok
At this point the filer is inaccessible, and I can't find a way to get it up and running. Is there a way to flush the NVRAM or ignore an existing dump... some way to turn NFS back on so the data can be retrieved. Booting the kernel off floppy doesn't help because it tries to replay the WAFL logs too, and another panic occurs. The only way around I've found is to wipe out the filesystem and start over again (obviously not the optimal solution). Ideas?