Had another (different) Netapp crash last night. One thing I don't
understand is why it decided to rewrite/recompute parity? Is that normal.
There's nothing abnormal in the logs above where it starts to rebuild
parity. It seems it crashed because it found two shelves with ID 2, but we
haven't touched this in forever and the shelf IDs should've have changed.
Would it be possible for this netapp to have been functioning all this time
(+6 months) with two shelves with the same ID?
Here's the excerpt from the logs:
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 0, stripe #2884782.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 1, stripe #2884788.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 1, stripe #2884793.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 1, stripe #2884797.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 0, stripe #2884790.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 1, stripe #2884803.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 1, stripe #2884804.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 0, stripe #2884804.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 3, stripe #2884789.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 3, stripe #2884792.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 3, stripe #2884796.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 2, stripe #2884779.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 2, stripe #2884789.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 2, stripe #2884796.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 2, stripe #2884800.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 2, stripe #2884801.
Thu Feb 13 00:54:45 GMT [raid_stripe_owner:notice]: Rewriting parity on
volume vol1, RAID group 2, stripe #2884802.
Wed Feb 12 19:22:50 EST [kern.syslog.msg:error]: Multiple shelves with ID 2
found on channel 4.
Wed Feb 12 19:22:50 EST [sk.panic:ALERT]: reason="Multiple shelves with ID 2
found on channel 4. in process ses_admin on release NetApp Release 6.1.3"
Wed Feb 12 19:54:09 EST [kern.syslog.msg:info]: Ethernet e0: Link up.
Wed Feb 12 19:54:20 EST [kern.syslog.msg:error]: Enclosure Services
unavailable for one or more shelves on channel 4.
Wed Feb 12 19:54:34 EST [kern.syslog.msg:info]: Reinitializing checksum
blocks on volume vol1.
Wed Feb 12 19:54:35 EST [kern.syslog.msg:info]: Reinitializing checksum
blocks on volume vol0.
Wed Feb 12 19:54:41 EST [kern.syslog.msg:info]: Starting RAID checksum
upgrade phase 1 (of 2) on volume vol1.
Wed Feb 12 19:54:41 EST [kern.syslog.msg:notice]: Beginning parity
recomputation on volume vol1, RAID group 0.
Wed Feb 12 19:54:41 EST [kern.syslog.msg:notice]: Beginning parity
recomputation on volume vol1, RAID group 1.
Wed Feb 12 19:54:41 EST [kern.syslog.msg:notice]: Beginning parity
recomputation on volume vol1, RAID group 2.
Wed Feb 12 19:54:41 EST [kern.syslog.msg:notice]: Beginning parity
recomputation on volume vol1, RAID group 3.
Wed Feb 12 19:54:41 EST [kern.syslog.msg:notice]: Skipping parity
recomputation on volume vol0, RAID group 0 (no dirty ranges).
Wed Feb 12 19:54:42 EST [kern.syslog.msg:notice]: The system was down for
1879 seconds
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 0, stripe #2884782.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 1, stripe #2884788.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 1, stripe #2884793.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 1, stripe #2884797.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 0, stripe #2884790.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 1, stripe #2884803.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 1, stripe #2884804.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 0, stripe #2884804.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 3, stripe #2884789.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 3, stripe #2884792.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 3, stripe #2884796.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 2, stripe #2884779.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 2, stripe #2884789.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 2, stripe #2884796.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 2, stripe #2884800.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 2, stripe #2884801.
Wed Feb 12 19:54:45 EST [kern.syslog.msg:notice]: Rewriting parity on volume
vol1, RAID group 2, stripe #2884802.
Wed Feb 12 19:54:50 EST [dyn_dev_qual_admin:info]: Firmware is up-to-date on
all disk drives
Wed Feb 12 19:54:50 EST [pvif.switchLink:warning]: trunk: switching to e0
Wed Feb 12 19:54:50 EST [ltm services:info]: Ethernet e1a: Link up.
Wed Feb 12 19:54:50 EST [net_e0:info]: arp info overwritten for 10.100.26.11
by 00:00:0c:07:ac:3d
Wed Feb 12 19:54:51 EST [ltm services:info]: Ethernet e1d: Link up.
Wed Feb 12 19:54:54 EST [rc:ALERT]: timed: time daemon started
Wed Feb 12 19:54:54 EST [CIFSAdmin:info]: Connection with DC \\N2M-BE
established
Wed Feb 12 19:54:54 EST [mgr.boot.disk_done:info]: NetApp Release 6.1.3 boot
complete. Last disk update written at Wed Feb 12 19:22:31 EST 2003
Wed Feb 12 19:54:54 EST [mgr.boot.reason_abnormal:ALERT]: System rebooted
after a panic.
Wed Feb 12 19:54:54 EST [mgr.stack.saved:notice]: Reboot with saved panic
information in log file
Wed Feb 12 19:54:54 EST [mgr.stack.string:notice]: Panic string: Multiple
shelves with ID 2 found on channel 4. in process ses_admin on release NetApp
Release 6.1.3
Wed Feb 12 19:54:54 EST [mgr.stack.at:notice]: Panic occurred at: Thu Feb 13
00:22:49 2003
Wed Feb 12 19:54:54 EST [mgr.stack.proc:notice]: Panic in process: ses_admin
Wed Feb 12 19:54:55 EST [mgr.stack.framename:notice]: Stack frame 0:
sk_panic(0xfffffc00005e64f0) + 0x394
Wed Feb 12 19:54:56 EST [mgr.stack.framename:notice]: Stack frame 1:
ses_scan(0xfffffc00006fcf10) + 0x638
Wed Feb 12 19:54:56 EST [mgr.stack.framename:notice]: Stack frame 2:
ses_handle_signal(0xfffffc0000702090) + 0x39c
Wed Feb 12 19:54:56 EST [mgr.stack.framename:notice]: Stack frame 3:
SesAdmin(0xfffffc0000702790) + 0x1dc
Wed Feb 12 19:54:56 EST [mgr.stack.framename:notice]: Stack frame 4:
sk_hw_save_state_and_loop(0xfffffc00003f92e0) + 0x70