disk drive fails and netapp thinks there is an open in the FCAL l oop - toasters

20 May 2003


      This is the second time this has happened in the past week.  I'm thinking
there might actually be a hardware issue somewhere along the FCAL loop but
does anyone else have any theories.
from the message log:
.
.
.
Mon May 19 00:00:00 EDT [statd]:  12:00am up  2 days, 13:25 18869073 NFS
ops, 4573028 CIFS ops, 0 HTTP ops
Mon May 19 00:24:25 EDT [isp2100_timeout]: 1.1
(0xfffffc0000d83560,0x2a:00289370:0008,0/0,23228/0/0,58593/0): comman
d timeout, quiescing drive.
Mon May 19 00:24:29 EDT [isp2100_timeout]: 1.1: global device timer timeout,
initiating device recovery.
Mon May 19 00:24:29 EDT [isp2100_timeout]: Resetting device 1.1
.
. ^ what happened here?
.
Tue May 20 02:46:55 EDT [isp2100_timeout]: 1.42
(0xfffffc0000d6ed20,0x28:00598770:0010,0/0,8363/0/0,56085/0): comman
d timeout, quiescing drive.
Tue May 20 02:46:58 EDT [isp2100_timeout]: 1.42: global device timer
timeout, initiating device recovery.
Tue May 20 02:46:58 EDT [isp2100_timeout]: Resetting device 1.42
Tue May 20 02:47:02 EDT [isp2100_timeout]: Resetting ISP2100 in slot 1
Tue May 20 02:47:22 EDT last message repeated 2 times
Tue May 20 02:47:28 EDT [isp2100_timeout]: Loop recovery event generated by
device 1.0.
Tue May 20 02:47:33 EDT [isp2100_timeout]: Resetting ISP2100 in slot 1
Tue May 20 14:50:21 GMT [rc]: NIS: Group Caching has been enabled
Tue May 20 10:50:22 EDT [rc]: e2a: Link up.
Tue May 20 10:50:23 EDT [ses_admin]: No SCSI-3 Enclosure Services on host
adapter 1 shelf 0.
Tue May 20 10:50:23 EDT [ses_admin]: Check drive placement.
.
.
.
Tue May 20 10:50:28 EDT [rc]: relog syslog Tue May 20 02:47:53 EDT
[isp2100_timeout]: 
        Offlining loop attached to HBA in slot 1.
        Will try to reco
Tue May 20 10:50:28 EDT [rc]: relog syslog Tue May 20 02:47:53 EDT
[isp2100_timeout]: The fibre channel loop attache
d to adapter 1 has gone down
So I power off the netapp, power off all the disk shelves.  Turn them back
on, boot the netapp up and it comes up fine, both times same scenario.  The
previous time a disk had it's red LED lit up so it was easy to tell that it
needed replacing.  This did not happen this time.  I pulled disk 1.42
because I thought it might have been the one causing problems and it's been
replaced with a spare.  Should I also pull disk 1.1?
Thanks
Dan
One other strang thing.  I can no longer log into Filer View.  It prompts me
for the password to the netapp, then once I enter the password it comes back
with the following error : "Error communicating with host na4m-be
msg=Connection refused: connect".  Is the password for Filer View the same
as the root password for the netapp.  I am still able to telnet in.  I tried
disabling and enabling the httpd server and that didn't seem to help.