Had another (different) Netapp crash last night. One thing I don't understand is why it decided to rewrite/recompute parity? Is that normal. There's nothing abnormal in the logs above where it starts to rebuild parity. It seems it crashed because it found two shelves with ID 2, but we haven't touched this in forever and the shelf IDs should've have changed. Would it be possible for this netapp to have been functioning all this time (+6 months) with two shelves with the same ID?
Here's the excerpt from the logs:
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 0, stripe #2884782. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 1, stripe #2884788. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 1, stripe #2884793. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 1, stripe #2884797. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 0, stripe #2884790. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 1, stripe #2884803. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 1, stripe #2884804. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 0, stripe #2884804. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 3, stripe #2884789. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 3, stripe #2884792. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 3, stripe #2884796. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884779. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884789. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884796. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884800. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884801. Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884802. Wed Feb 12 19:22:50 EST [kern.syslog.msg:error]: Multiple shelves with ID 2 found on channel 4. Wed Feb 12 19:22:50 EST [sk.panic:ALERT]: reason="Multiple shelves with ID 2 found on channel 4. in process ses_admin on release NetApp Release 6.1.3" Wed Feb 12 19:54:09 EST [kern.syslog.msg:info]: Ethernet e0: Link up. Wed Feb 12 19:54:20 EST [kern.syslog.msg:error]: Enclosure Services unavailable for one or more shelves on channel 4. Wed Feb 12 19:54:34 EST [kern.syslog.msg:info]: Reinitializing checksum blocks on volume vol1. Wed Feb 12 19:54:35 EST [kern.syslog.msg:info]: Reinitializing checksum blocks on volume vol0. Wed Feb 12 19:54:41 EST [kern.syslog.msg:info]: Starting RAID checksum upgrade phase 1 (of 2) on volume vol1. Wed Feb 12 19:54:41 EST [kern.syslog.msg:notice]: Beginning parity recomputation on volume vol1, RAID group 0. Wed Feb 12 19:54:41 EST [kern.syslog.msg:notice]: Beginning parity recomputation on volume vol1, RAID group 1. Wed Feb 12 19:54:41 EST [kern.syslog.msg:notice]: Beginning parity recomputation on volume vol1, RAID group 2. Wed Feb 12 19:54:41 EST [kern.syslog.msg:notice]: Beginning parity recomputation on volume vol1, RAID group 3. Wed Feb 12 19:54:41 EST [kern.syslog.msg:notice]: Skipping parity recomputation on volume vol0, RAID group 0 (no dirty ranges). Wed Feb 12 19:54:42 EST [kern.syslog.msg:notice]: The system was down for 1879 seconds Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 0, stripe #2884782. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 1, stripe #2884788. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 1, stripe #2884793. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 1, stripe #2884797. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 0, stripe #2884790. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 1, stripe #2884803. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 1, stripe #2884804. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 0, stripe #2884804. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 3, stripe #2884789. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 3, stripe #2884792. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 3, stripe #2884796. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884779. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884789. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884796. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884800. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884801. Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume vol1, RAID group 2, stripe #2884802. Wed Feb 12 19:54:50 EST [dyn_dev_qual_admin:info]: Firmware is up-to-date on all disk drives Wed Feb 12 19:54:50 EST [pvif.switchLink:warning]: trunk: switching to e0 Wed Feb 12 19:54:50 EST [ltm services:info]: Ethernet e1a: Link up. Wed Feb 12 19:54:50 EST [net_e0:info]: arp info overwritten for 10.100.26.11 by 00:00:0c:07:ac:3d Wed Feb 12 19:54:51 EST [ltm services:info]: Ethernet e1d: Link up. Wed Feb 12 19:54:54 EST [rc:ALERT]: timed: time daemon started Wed Feb 12 19:54:54 EST [CIFSAdmin:info]: Connection with DC \N2M-BE established Wed Feb 12 19:54:54 EST [mgr.boot.disk_done:info]: NetApp Release 6.1.3 boot complete. Last disk update written at Wed Feb 12 19:22:31 EST 2003 Wed Feb 12 19:54:54 EST [mgr.boot.reason_abnormal:ALERT]: System rebooted after a panic. Wed Feb 12 19:54:54 EST [mgr.stack.saved:notice]: Reboot with saved panic information in log file Wed Feb 12 19:54:54 EST [mgr.stack.string:notice]: Panic string: Multiple shelves with ID 2 found on channel 4. in process ses_admin on release NetApp Release 6.1.3 Wed Feb 12 19:54:54 EST [mgr.stack.at:notice]: Panic occurred at: Thu Feb 13 00:22:49 2003 Wed Feb 12 19:54:54 EST [mgr.stack.proc:notice]: Panic in process: ses_admin Wed Feb 12 19:54:55 EST [mgr.stack.framename:notice]: Stack frame 0: sk_panic(0xfffffc00005e64f0) + 0x394 Wed Feb 12 19:54:56 EST [mgr.stack.framename:notice]: Stack frame 1: ses_scan(0xfffffc00006fcf10) + 0x638 Wed Feb 12 19:54:56 EST [mgr.stack.framename:notice]: Stack frame 2: ses_handle_signal(0xfffffc0000702090) + 0x39c Wed Feb 12 19:54:56 EST [mgr.stack.framename:notice]: Stack frame 3: SesAdmin(0xfffffc0000702790) + 0x1dc Wed Feb 12 19:54:56 EST [mgr.stack.framename:notice]: Stack frame 4: sk_hw_save_state_and_loop(0xfffffc00003f92e0) + 0x70