i need help making sense of what happened here - toasters

29 Apr 2003


      Today a disk failed, but different from what I'm used to seeing.  It appears
now that I look at it that disk 4.19 failed and it took spare disk 4.38 and
began rebuilding on that.  What seems strange though is that it kept trying
to read or write to disk 4.19 until I physically pulled the disk out.  Why
would this have happened?
na3m-be> sysconfig -v
        NetApp Release 6.1.2R2: Thu Mar 21 02:30:01 PST 2002
        System ID: 0016794273 (na3m-be)
        slot 0: System Board (NetApp System Board V N4)
Tue Apr 29 08:00:00 EDT [kern.uptime.filer:info]:   8:00am up  7 days, 19:27
230387 NFS ops, 537297 CIFS ops, 0 HTTP ops
Tue Apr 29 08:56:39 EDT [ispfc_timeout:warning]: 4.19 (0x02000013)
(0xfffffc0001778348,0x1c,0/0,4947/0/0,70908/0): command timeout,
quiescing drive to allow outstanding I/O to complete.
Tue Apr 29 08:56:42 EDT [ispfc_timeout:error]: 4.19 (0x02000013): global
device timer timeout, initiating device recovery.
Tue Apr 29 08:56:42 EDT [ispfc_timeout:warning]: Resetting device 4.19
(0x02000013) to clear outstanding I/O.
Tue Apr 29 08:57:18 EDT [ses_admin:error]: No more valid paths to Enclosure
Services in shelf 2 on channel 4.
Tue Apr 29 08:57:18 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 08:57:23 EDT [asup.sent.mail:notice]: System Notification mail
sent
Tue Apr 29 08:58:00 EDT [monitor.globalStatus.critical:CRITICAL]: Disk shelf
configuration error.
Tue Apr 29 08:59:38 EDT [scsi.cmd.transportError:error]: Device 4.19:
Transport error during execution of command: HA status 0x9: cd
b 0x4d.
Tue Apr 29 08:59:38 EDT last message repeated 7 times
Tue Apr 29 09:00:00 EDT [kern.uptime.filer:info]:   9:00am up  7 days, 20:27
230776 NFS ops, 537297 CIFS ops, 0 HTTP ops
Tue Apr 29 09:00:00 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 09:00:20 EDT [scsi.cmd.underrun:error]: Device 4.19: Received a
data underrun: cdb 0x12.  Not all the data was received.
 Possible transmission error.  I/O will be retried.
Tue Apr 29 09:00:20 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 09:00:21 EDT [ses_admin:error]: No more valid paths to Enclosure
Services in shelf 2 on channel 4.
Tue Apr 29 09:00:21 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 09:00:26 EDT [asup.sent.mail:notice]: System Notification mail
sent
Tue Apr 29 09:01:37 EDT [ispfc_timeout:warning]: 4.19 (0x02000013)
(0xfffffc0001773f08,0x1c,0/0,4958/0/4947,72354/5372): command tim
eout, quiescing drive to allow outstanding I/O to complete.
Tue Apr 29 09:01:40 EDT [ispfc_timeout:error]: 4.19 (0x02000013): global
device timer timeout, initiating device recovery.
Tue Apr 29 09:01:40 EDT [ispfc_timeout:warning]: Resetting device 4.19
(0x02000013) to clear outstanding I/O.
Tue Apr 29 09:02:24 EDT [scsi.cmd.underrun:error]: Device 4.19: Received a
data underrun: cdb 0x12.  Not all the data was received.
 Possible transmission error.  I/O will be retried.
Tue Apr 29 09:02:25 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 09:02:25 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x1d: Sense Data not ready -  (0x2 - 0x4
0x0 0x2).
Tue Apr 29 09:02:25 EDT last message repeated 3 times
Tue Apr 29 09:02:25 EDT [ses_admin:error]: Enclosure control failed via 4.19
(shelf 2), searching for alternate path
Tue Apr 29 09:02:25 EDT [ses_admin:error]: No more valid paths to Enclosure
Services in shelf 2 on channel 4.
Tue Apr 29 09:02:25 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 09:02:26 EDT [asup.sent.mail:notice]: System Notification mail
sent
Tue Apr 29 09:02:55 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 09:02:56 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x1d: Sense Data not ready -  (0x2 - 0x4
0x0 0x2).
Tue Apr 29 09:02:56 EDT last message repeated 3 times
Tue Apr 29 09:02:56 EDT [ses_admin:error]: Enclosure control failed via 4.19
(shelf 2), searching for alternate path
Tue Apr 29 09:02:56 EDT [ses_admin:error]: No more valid paths to Enclosure
Services in shelf 2 on channel 4.
Tue Apr 29 09:02:56 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 09:02:56 EDT [asup.sent.mail:notice]: System Notification mail
sent
Tue Apr 29 09:03:12 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 09:03:12 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x1d: Sense Data not ready -  (0x2 - 0x4
0x0 0x2).
Tue Apr 29 09:03:12 EDT last message repeated 3 times
Tue Apr 29 09:03:12 EDT [ses_admin:error]: Enclosure control failed via 4.19
(shelf 2), searching for alternate path
Tue Apr 29 09:03:12 EDT [ses_admin:error]: No more valid paths to Enclosure
Services in shelf 2 on channel 4.
Tue Apr 29 09:03:12 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 09:03:13 EDT [asup.sent.mail:notice]: System Notification mail
sent
Tue Apr 29 09:03:29 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 09:03:29 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x1d: Sense Data not ready -  (0x2 - 0x4
0x0 0x2).
Tue Apr 29 09:03:29 EDT last message repeated 3 times
Tue Apr 29 09:03:29 EDT [ses_admin:error]: Enclosure control failed via 4.19
(shelf 2), searching for alternate path
Tue Apr 29 09:03:29 EDT [ses_admin:error]: No more valid paths to Enclosure
Services in shelf 2 on channel 4.
Tue Apr 29 09:03:29 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 09:03:29 EDT [asup.sendError.throttled:info]: Too many
autosupport messages in too short a time: throttling CONFIGURATION
_ERROR!!! mail.
Tue Apr 29 09:03:50 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 09:03:50 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x1d: Sense Data not ready -  (0x2 - 0x4
0x0 0x2).
Tue Apr 29 09:03:50 EDT last message repeated 3 times
Tue Apr 29 09:03:50 EDT [ses_admin:error]: Enclosure control failed via 4.19
(shelf 2), searching for alternate path
Tue Apr 29 09:03:50 EDT [ses_admin:error]: No more valid paths to Enclosure
Services in shelf 2 on channel 4.
Tue Apr 29 09:03:50 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 09:03:50 EDT [asup.sendError.throttled:info]: Too many
autosupport messages in too short a time: throttling CONFIGURATION
_ERROR!!! mail.
Tue Apr 29 09:04:06 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 09:04:07 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x1d: Sense Data not ready -  (0x2 - 0x4
0x0 0x2).
Tue Apr 29 09:04:07 EDT last message repeated 3 times
Tue Apr 29 09:04:07 EDT [ses_admin:error]: Enclosure control failed via 4.19
(shelf 2), searching for alternate path
Tue Apr 29 09:04:07 EDT [ses_admin:error]: No more valid paths to Enclosure
Services in shelf 2 on channel 4.
Tue Apr 29 09:04:07 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 09:04:07 EDT [asup.sendError.throttled:info]: Too many
autosupport messages in too short a time: throttling CONFIGURATION
_ERROR!!! mail.
Tue Apr 29 09:04:25 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 09:04:25 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x1d: Sense Data not ready -  (0x2 - 0x4
0x0 0x2).
Tue Apr 29 09:04:25 EDT last message repeated 3 times
Tue Apr 29 09:04:25 EDT [ses_admin:error]: Enclosure control failed via 4.19
(shelf 2), searching for alternate path
Tue Apr 29 09:04:25 EDT [ses_admin:error]: No more valid paths to Enclosure
Services in shelf 2 on channel 4.
Tue Apr 29 09:04:25 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 09:04:25 EDT [asup.sendError.throttled:info]: Too many
autosupport messages in too short a time: throttling CONFIGURATION
_ERROR!!! mail.
Tue Apr 29 09:04:44 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 09:04:44 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x1d: Sense Data not ready -  (0x2 - 0x4
0x0 0x2).
Tue Apr 29 09:04:44 EDT last message repeated 3 times
Tue Apr 29 09:04:44 EDT [ses_admin:error]: Enclosure control failed via 4.19
(shelf 2), searching for alternate path
Tue Apr 29 09:04:44 EDT [ses_admin:error]: No more valid paths to Enclosure
Services in shelf 2 on channel 4.
Tue Apr 29 09:04:44 EDT [monitor.shelf.configError:CRITICAL]: Configuration
error on disk storage shelf attached to slot 4. Please c
heck drive placement.
Tue Apr 29 09:04:44 EDT [asup.sendError.throttled:info]: Too many
autosupport messages in too short a time: throttling CONFIGURATION
_ERROR!!! mail.
Tue Apr 29 09:05:00 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x28:00cc2f18:0008: Sense Data not ready
-  (0x2 - 0x4 0x0 0x2).
Tue Apr 29 09:05:00 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x28:00cc2ff8:0008: Sense Data not ready
-  (0x2 - 0x4 0x0 0x2).
Tue Apr 29 09:05:00 EDT [raid.diskFailed:CRITICAL]: Read on data disk 4.19
in volume vol1, RAID group 1, failed; reverting to degrad
ed mode.
Tue Apr 29 09:05:00 EDT [raid_admin:CRITICAL]: Label write on 4.19 (S/N
LKJ781640000101937T4) failed.
Tue Apr 29 09:05:00 EDT [raid_admin:CRITICAL]: Label write on 4.19 (S/N
LKJ781640000101937T4) failed.
Tue Apr 29 09:05:02 EDT [raid_admin:CRITICAL]: Read on data disk 4.19 in
volume vol1, RAID group 1, failed; entered degraded mode.
Tue Apr 29 09:05:02 EDT [raid_admin:notice]: One disk is missing from volume
vol1, RAID group 1.
        A "hot spare" disk (4.38) is available and the missing disk
        will be reconstructed on the spare disk.
Tue Apr 29 09:05:02 EDT [dyn_dev_qual_admin:info]: Successfully updated
/etc/.broken_disks file on local FS for disk 4.19 SEAGATE  S
T118202FC       LKJ781640000101937T4
Tue Apr 29 09:05:02 EDT [asup.sent.mail:notice]: System Notification mail
sent
Tue Apr 29 09:05:03 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 09:05:03 EDT [scsi.cmd.checkCondition:error]: Device 4.19: Check
Condition: CDB 0x1d: Sense Data not ready -  (0x2 - 0x4
0x0 0x2).
Tue Apr 29 09:05:03 EDT last message repeated 3 times
^ The following continued to be reported to the log file until I physically
pulled the drive and replaced it with a spare.  The log file continues
below:
Tue Apr 29 10:12:54 EDT [ispfc_timeout:warning]: Resetting Fibre Channel
adapter 4.
Tue Apr 29 10:13:00 EDT [scsi.cmd.selectionTimeout:error]: Device 4.19:
Adapter/target error: HA status 0x7: cdb 0x12.  Targeted dev
ice did not respond to requested I/O.  I/O will be retried.
Tue Apr 29 10:13:00 EDT [scsi.cmd.noMorePaths:error]: Device 4.19: No more
paths to device: cdb 0x12.  All retries have failed.
Tue Apr 29 10:13:17 EDT [ispfc_timeout:warning]: 4.19 (0x02000013)
(0xfffffc0001779998,0x12,0/0,1/0/0,98110/0): command timeout, qui
escing drive to allow outstanding I/O to complete.
Tue Apr 29 10:13:27 EDT [monitor.shelf.configError.ok:CRITICAL]:
Configuration error previously reported on disk storage shelf attac
hed to slot 4 has been corrected.
Tue Apr 29 10:13:30 EDT [dyn_dev_qual_admin:info]: Now downloading firmware
file /etc/disk_fw/ST118202FC.NA27.LOD on 1 disk ...
Tue Apr 29 10:14:00 EDT [monitor.globalStatus.nonCritical:warning]: Disk on
adapter 4, shelf 2, bay 3, failed.
Tue Apr 29 10:15:00 EDT [dyn_dev_qual_admin:info]: Firmware downloaded on
disk 4.19
Tue Apr 29 10:15:00 EDT [dyn_dev_qual_admin:info]: Media Access test
successful on disk 4.19
Tue Apr 29 10:15:00 EDT [dyn_dev_qual_admin:info]: Firmware is up-to-date on
all disk drives
Tue Apr 29 10:15:00 EDT [raid_disk_admin:notice]: Disk 4.19 (S/N
LKH21431000020030EDP) has been added to incomplete volume vol1(1).
Tue Apr 29 10:15:00 EDT [monitor.globalStatus.ok:info]: The system's global
status is normal.
Tue Apr 29 11:00:00 EDT [kern.uptime.filer:info]:  11:00am up  7 days, 22:27
233428 NFS ops, 537525 CIFS ops, 0 HTTP ops
Tue Apr 29 11:23:21 EDT [consumer:notice]: Reconstruction of data disk in
volume vol1, RAID group 1, complete.
Tue Apr 29 11:23:21 EDT [download.update:info]: Begin bootblock update,
prototype is 4.8
Tue Apr 29 11:23:29 EDT [download.updateDone:info]: Bootblock update
completed
Thanks
Dan