Today a disk failed, but different from what I'm used to seeing. It appears now that I look at it that disk 4.19 failed and it took spare disk 4.38 and began rebuilding on that. What seems strange though is that it kept trying to read or write to disk 4.19 until I physically pulled the disk out. Why would this have happened?
na3m-be> sysconfig -v NetApp Release 6.1.2R2: Thu Mar 21 02:30:01 PST 2002 System ID: 0016794273 (na3m-be) slot 0: System Board (NetApp System Board V N4)
Tue Apr 29 08:00:00 EDT [kern.uptime.filer:info]: 8:00am up 7 days, 19:27 230387 NFS ops, 537297 CIFS ops, 0 HTTP ops Tue Apr 29 08:56:39 EDT [ispfc_timeout:warning]: 4.19 (0x02000013) (0xfffffc0001778348,0x1c,0/0,4947/0/0,70908/0): command timeout, quiescing drive to allow outstanding I/O to complete. Tue Apr 29 08:56:42 EDT [ispfc_timeout:error]: 4.19 (0x02000013): global device timer timeout, initiating device recovery. Tue Apr 29 08:56:42 EDT [ispfc_timeout:warning]: Resetting device 4.19 (0x02000013) to clear outstanding I/O. Tue Apr 29 08:57:18 EDT [ses_admin:error]: No more valid paths to Enclosure Services in shelf 2 on channel 4. Tue Apr 29 08:57:18 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 08:57:23 EDT [asup.sent.mail:notice]: System Notification mail sent Tue Apr 29 08:58:00 EDT [monitor.globalStatus.critical:CRITICAL]: Disk shelf configuration error. Tue Apr 29 08:59:38 EDT [scsi.cmd.transportError:error]: Device 4.19: Transport error during execution of command: HA status 0x9: cd b 0x4d. Tue Apr 29 08:59:38 EDT last message repeated 7 times Tue Apr 29 09:00:00 EDT [kern.uptime.filer:info]: 9:00am up 7 days, 20:27 230776 NFS ops, 537297 CIFS ops, 0 HTTP ops Tue Apr 29 09:00:00 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 09:00:20 EDT [scsi.cmd.underrun:error]: Device 4.19: Received a data underrun: cdb 0x12. Not all the data was received. Possible transmission error. I/O will be retried. Tue Apr 29 09:00:20 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 09:00:21 EDT [ses_admin:error]: No more valid paths to Enclosure Services in shelf 2 on channel 4. Tue Apr 29 09:00:21 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 09:00:26 EDT [asup.sent.mail:notice]: System Notification mail sent Tue Apr 29 09:01:37 EDT [ispfc_timeout:warning]: 4.19 (0x02000013) (0xfffffc0001773f08,0x1c,0/0,4958/0/4947,72354/5372): command tim eout, quiescing drive to allow outstanding I/O to complete. Tue Apr 29 09:01:40 EDT [ispfc_timeout:error]: 4.19 (0x02000013): global device timer timeout, initiating device recovery. Tue Apr 29 09:01:40 EDT [ispfc_timeout:warning]: Resetting device 4.19 (0x02000013) to clear outstanding I/O. Tue Apr 29 09:02:24 EDT [scsi.cmd.underrun:error]: Device 4.19: Received a data underrun: cdb 0x12. Not all the data was received. Possible transmission error. I/O will be retried. Tue Apr 29 09:02:25 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 09:02:25 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x1d: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:02:25 EDT last message repeated 3 times Tue Apr 29 09:02:25 EDT [ses_admin:error]: Enclosure control failed via 4.19 (shelf 2), searching for alternate path Tue Apr 29 09:02:25 EDT [ses_admin:error]: No more valid paths to Enclosure Services in shelf 2 on channel 4. Tue Apr 29 09:02:25 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 09:02:26 EDT [asup.sent.mail:notice]: System Notification mail sent Tue Apr 29 09:02:55 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 09:02:56 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x1d: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:02:56 EDT last message repeated 3 times Tue Apr 29 09:02:56 EDT [ses_admin:error]: Enclosure control failed via 4.19 (shelf 2), searching for alternate path Tue Apr 29 09:02:56 EDT [ses_admin:error]: No more valid paths to Enclosure Services in shelf 2 on channel 4. Tue Apr 29 09:02:56 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 09:02:56 EDT [asup.sent.mail:notice]: System Notification mail sent Tue Apr 29 09:03:12 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 09:03:12 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x1d: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:03:12 EDT last message repeated 3 times Tue Apr 29 09:03:12 EDT [ses_admin:error]: Enclosure control failed via 4.19 (shelf 2), searching for alternate path Tue Apr 29 09:03:12 EDT [ses_admin:error]: No more valid paths to Enclosure Services in shelf 2 on channel 4. Tue Apr 29 09:03:12 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 09:03:13 EDT [asup.sent.mail:notice]: System Notification mail sent Tue Apr 29 09:03:29 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 09:03:29 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x1d: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:03:29 EDT last message repeated 3 times Tue Apr 29 09:03:29 EDT [ses_admin:error]: Enclosure control failed via 4.19 (shelf 2), searching for alternate path Tue Apr 29 09:03:29 EDT [ses_admin:error]: No more valid paths to Enclosure Services in shelf 2 on channel 4. Tue Apr 29 09:03:29 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 09:03:29 EDT [asup.sendError.throttled:info]: Too many autosupport messages in too short a time: throttling CONFIGURATION _ERROR!!! mail. Tue Apr 29 09:03:50 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 09:03:50 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x1d: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:03:50 EDT last message repeated 3 times Tue Apr 29 09:03:50 EDT [ses_admin:error]: Enclosure control failed via 4.19 (shelf 2), searching for alternate path Tue Apr 29 09:03:50 EDT [ses_admin:error]: No more valid paths to Enclosure Services in shelf 2 on channel 4. Tue Apr 29 09:03:50 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 09:03:50 EDT [asup.sendError.throttled:info]: Too many autosupport messages in too short a time: throttling CONFIGURATION _ERROR!!! mail. Tue Apr 29 09:04:06 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 09:04:07 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x1d: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:04:07 EDT last message repeated 3 times Tue Apr 29 09:04:07 EDT [ses_admin:error]: Enclosure control failed via 4.19 (shelf 2), searching for alternate path Tue Apr 29 09:04:07 EDT [ses_admin:error]: No more valid paths to Enclosure Services in shelf 2 on channel 4. Tue Apr 29 09:04:07 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 09:04:07 EDT [asup.sendError.throttled:info]: Too many autosupport messages in too short a time: throttling CONFIGURATION _ERROR!!! mail. Tue Apr 29 09:04:25 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 09:04:25 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x1d: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:04:25 EDT last message repeated 3 times Tue Apr 29 09:04:25 EDT [ses_admin:error]: Enclosure control failed via 4.19 (shelf 2), searching for alternate path Tue Apr 29 09:04:25 EDT [ses_admin:error]: No more valid paths to Enclosure Services in shelf 2 on channel 4. Tue Apr 29 09:04:25 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 09:04:25 EDT [asup.sendError.throttled:info]: Too many autosupport messages in too short a time: throttling CONFIGURATION _ERROR!!! mail. Tue Apr 29 09:04:44 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 09:04:44 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x1d: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:04:44 EDT last message repeated 3 times Tue Apr 29 09:04:44 EDT [ses_admin:error]: Enclosure control failed via 4.19 (shelf 2), searching for alternate path Tue Apr 29 09:04:44 EDT [ses_admin:error]: No more valid paths to Enclosure Services in shelf 2 on channel 4. Tue Apr 29 09:04:44 EDT [monitor.shelf.configError:CRITICAL]: Configuration error on disk storage shelf attached to slot 4. Please c heck drive placement. Tue Apr 29 09:04:44 EDT [asup.sendError.throttled:info]: Too many autosupport messages in too short a time: throttling CONFIGURATION _ERROR!!! mail. Tue Apr 29 09:05:00 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x28:00cc2f18:0008: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:05:00 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x28:00cc2ff8:0008: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:05:00 EDT [raid.diskFailed:CRITICAL]: Read on data disk 4.19 in volume vol1, RAID group 1, failed; reverting to degrad ed mode. Tue Apr 29 09:05:00 EDT [raid_admin:CRITICAL]: Label write on 4.19 (S/N LKJ781640000101937T4) failed.
Tue Apr 29 09:05:00 EDT [raid_admin:CRITICAL]: Label write on 4.19 (S/N LKJ781640000101937T4) failed.
Tue Apr 29 09:05:02 EDT [raid_admin:CRITICAL]: Read on data disk 4.19 in volume vol1, RAID group 1, failed; entered degraded mode. Tue Apr 29 09:05:02 EDT [raid_admin:notice]: One disk is missing from volume vol1, RAID group 1. A "hot spare" disk (4.38) is available and the missing disk will be reconstructed on the spare disk. Tue Apr 29 09:05:02 EDT [dyn_dev_qual_admin:info]: Successfully updated /etc/.broken_disks file on local FS for disk 4.19 SEAGATE S T118202FC LKJ781640000101937T4
Tue Apr 29 09:05:02 EDT [asup.sent.mail:notice]: System Notification mail sent Tue Apr 29 09:05:03 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 09:05:03 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check Condition: CDB 0x1d: Sense Data not ready - (0x2 - 0x4 0x0 0x2). Tue Apr 29 09:05:03 EDT last message repeated 3 times
^ The following continued to be reported to the log file until I physically pulled the drive and replaced it with a spare. The log file continues below:
Tue Apr 29 10:12:54 EDT [ispfc_timeout:warning]: Resetting Fibre Channel adapter 4. Tue Apr 29 10:13:00 EDT [scsi.cmd.selectionTimeout:error]: Device 4.19: Adapter/target error: HA status 0x7: cdb 0x12. Targeted dev ice did not respond to requested I/O. I/O will be retried. Tue Apr 29 10:13:00 EDT [scsi.cmd.noMorePaths:error]: Device 4.19: No more paths to device: cdb 0x12. All retries have failed. Tue Apr 29 10:13:17 EDT [ispfc_timeout:warning]: 4.19 (0x02000013) (0xfffffc0001779998,0x12,0/0,1/0/0,98110/0): command timeout, qui escing drive to allow outstanding I/O to complete. Tue Apr 29 10:13:27 EDT [monitor.shelf.configError.ok:CRITICAL]: Configuration error previously reported on disk storage shelf attac hed to slot 4 has been corrected. Tue Apr 29 10:13:30 EDT [dyn_dev_qual_admin:info]: Now downloading firmware file /etc/disk_fw/ST118202FC.NA27.LOD on 1 disk ... Tue Apr 29 10:14:00 EDT [monitor.globalStatus.nonCritical:warning]: Disk on adapter 4, shelf 2, bay 3, failed. Tue Apr 29 10:15:00 EDT [dyn_dev_qual_admin:info]: Firmware downloaded on disk 4.19 Tue Apr 29 10:15:00 EDT [dyn_dev_qual_admin:info]: Media Access test successful on disk 4.19 Tue Apr 29 10:15:00 EDT [dyn_dev_qual_admin:info]: Firmware is up-to-date on all disk drives Tue Apr 29 10:15:00 EDT [raid_disk_admin:notice]: Disk 4.19 (S/N LKH21431000020030EDP) has been added to incomplete volume vol1(1). Tue Apr 29 10:15:00 EDT [monitor.globalStatus.ok:info]: The system's global status is normal. Tue Apr 29 11:00:00 EDT [kern.uptime.filer:info]: 11:00am up 7 days, 22:27 233428 NFS ops, 537525 CIFS ops, 0 HTTP ops Tue Apr 29 11:23:21 EDT [consumer:notice]: Reconstruction of data disk in volume vol1, RAID group 1, complete. Tue Apr 29 11:23:21 EDT [download.update:info]: Begin bootblock update, prototype is 4.8 Tue Apr 29 11:23:29 EDT [download.updateDone:info]: Bootblock update completed
Thanks Dan
Daniel Finn wrote:
Today a disk failed, but different from what I'm used to seeing. It appears now that I look at it that disk 4.19 failed and it took spare disk 4.38 and began rebuilding on that. What seems strange though is that it kept trying to read or write to disk 4.19 until I physically pulled the disk out. Why would this have happened?
Hello Daniel
It appears, that the electronic controller of your disk failed. The filer tried to tell the disk "you are broken" and tried to write on the disks label, to make shure wherever this disk will go to ... it will tell any technician by it's raidlabel: I am broken. In your case I also see SES error messages for this shelf. ... let's calculate: Disk 19=16+3 Let me guess ... This is a FC-7/8/9 shelf and the slot 2 is/was empty? The slots 2 AND 3 should be filled both in those old schelves, just in case one of those disks breakes...
I think you can ignore the warnings you have seen. They were produced by the disks electronic.
Smile & regards Dirk