Re: Fibre Channel Woes - toasters

4 Feb 2005


      There doesn't appear to be a any invalid CRC's at this point.  The unit
is booted in readonly mode.  The only numbers showing up are for drive
.93 with link failure counts.  Does that indicate a bad drive?
-thanks so much.
Loop             Link  Underrun   Loss of   Invalid    Frame In   Frame
Out
 ID           Failure     count      sync       CRC       count
count
                count               count     count
8b.93               4         0         9         0       34700
264468
8b.92               0         0         0         0       62193
618089
8b.91               0         0         1         0       99539
925291
8b.90               0         0         0         0       93475
868841
8b.89               0         0         0         0       93726
868906
8b.88               0         0         3         0       92456
863458
8b.87               0         0         0         0        2162
1020
8b.86               0         0         2         0       93614
867250
8b.85               0         0         0         0         131
167
8b.84               0         0         0         0       62299
616393
8b.83               0         0         0         0       61890
617720
8b.82               0         0         0         0       61987
616789
8b.81               0         0         6         0       61901
617655
8b.80               0         0         3         0       25407
338325
8b.ha               0         0         1         0     8084399
845489
On Fri, Feb 04, 2005 at 05:55:25AM -0800, McCarthy, Tim wrote:
...
While the system is up and running, try a "fcadmin link_stats 8b"
Look at the CRC column.
It could be a bad controller on a disk or a ESH/LRC.
Look for the first device to show frame errors.
If it is the beginning of the shelf, it may be the ESH/LRC.
Swap it out.
If it is all of them, it could be a bad FC card, in which case, you need
to swap it out.
As always, please open a case 
--tmac
-----Original Message-----
From: Tavis Gustafson [mailto:tavis@hq.newdream.net] 
Sent: Friday, February 04, 2005 8:30 AM
To: toasters@mathworks.com
Subject: Fibre Channel Woes
I'm running an 840 with a DS14 ( 144GB disks ) connected to each other
with a dual port optical fibre channel card.  Early this morning the
filer went down and apon reboot i started seeing lots of fibre channel
frame errors.  I swapped optical cables and tried the second port on the
card but the errors came back when rebuilding.
I am trying to determine if the problem is the the fibre channel card or
with the LRC in the disk shelf.  Anybody know if this is indicatd in
these error messages ?
Thanks for any help
-Tavis
Volume State      Status            Options
           boot online     reconstruct       root, raidsize=7
amplifier> Fri Feb  4 11:55:07 GMT [FastEnet-10/100/e3c:notice]: uid
30358 tid 1: disk quota exceeded on volume boot.
onal warnings will be suppressed for approximately 60 minutes or until
either a 'quota resize' is performed.
Fri Feb  4 11:55:34 GMT [download.updateDone:info]: Bootblock update
completed
Fri Feb  4 11:56:35 GMT [FastEnet-10/100/e3c:notice]: uid 51154 tid 1:
disk quota exceeded on volume boot.  Additiona
gs will be suppressed for approximately 60 minutes or until either a
'quota resize' is performed.
amplifier> Fri Feb  4 12:00:00 GMT [kern.uptime.filer:info]:  12:00pm up
7 mins, 935006 NFS ops, 0 CIFS ops, 0 HTTP o
FS ops, 0 FCP ops, 0 iSCSI ops
amplifier> Fri Feb  4 12:08:09 GMT [wafl_hipri:notice]: uid 24271 tid 1:
disk quota exceeded on volume boot.  Additio
ings will be suppressed for approximately 60 minutes or until either a
'quota resize' is performed.
Fri Feb  4 12:13:59 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.80:
request failed after try #1: cdb 0x1c.
Fri Feb  4 12:14:23 GMT [scsi.cmd.checkCondition:error]: Device 8b.82:
Check Condition: CDB 0x2a:09b24e98:0080: Sense
SI:aborted command - Fibre Channel frame CRC error (0xb - 0x47 0x0
0x3)(57649).
Fri Feb  4 12:14:23 GMT [scsi.cmd.checkCondition:error]: Device 8b.82:
Check Condition: CDB 0x2a:09b24e18:0080: Sense
SI:aborted command - Fibre Channel frame CRC error (0xb - 0x47 0x0
0x3)(57657).
Fri Feb  4 12:14:31 GMT [scsi.cmd.retrySuccess:info]: Device 8b.82:
request successful after retry #1: cdb 0x2a:09b24
.
Fri Feb  4 12:14:31 GMT [scsi.cmd.retrySuccess:info]: Device 8b.82:
request successful after retry #1: cdb 0x2a:09b24
.
Fri Feb  4 12:14:31 GMT [wafl_lopri:warning]: NFS response to client
10.3.38.28 was slow, op was v3 read, 63 > 60 (in
)
Fri Feb  4 12:14:48 GMT [scsi.cmd.checkCondition:error]: Device 8b.84:
Check Condition: CDB 0x2a:09b25418:0080: Sense
SI:aborted command - Fibre Channel frame CRC error (0xb - 0x47 0x0
0x3)(16996).
Fri Feb  4 12:14:51 GMT [scsi.cmd.checkCondition:error]: Device 8b.84:
Check Condition: CDB 0x2a:09b25418:0080: Sense
SI:aborted command - Fibre Channel frame CRC error (0xb - 0x47 0x0
0x3)(20148).
Fri Feb  4 12:14:53 GMT [ispfc_timeout_1:warning]: 8b.81 (0x01000051)
(0x034f47b0,0x2a:09b25298:0080,0/0,20150/0/0,80
ommand timeout, quiescing drive to allow outstanding I/O to complete.
Fri Feb  4 12:15:33 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.80:
request failed after try #1: cdb 0x1c.
Fri Feb  4 12:15:40 GMT [telnet_0:info]: root logged in from host:
10.3.67.21
vol status
         Volume State      Status            Options
Fri Feb  4 12:16:19 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.80:
request failed after try #1: cdb 0x1c.
Fri Feb  4 12:16:37 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.84:
request failed after try #3: cdb 0x2a:09b25418
Fri Feb  4 12:16:37 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.81:
request failed after try #1: cdb 0x2a:09b25298
Fri Feb  4 12:16:49 GMT [ispfc_timeout_1:warning]: 8b.92 (0x0100005c)
(0x034f4d00,0x2a:09b25698:0080,0/0,19912/0/0,87
ommand timeout, quiescing drive to allow outstanding I/O to complete.
Fri Feb  4 12:16:49 GMT [ispfc_timeout_1:warning]: 8b.85 (0x01000055)
(0x034f5690,0x2a:09b25218:0080,0/0,11587/0/0,87
ommand timeout, quiescing drive to allow outstanding I/O to complete.
Fri Feb  4 12:16:49 GMT [ispfc_timeout_1:warning]: 8b.84 (0x01000054)
(0x034f6020,0x2f:0025a800:0400,0/0,20143/0/0,87
ommand timeout, quiescing drive to allow outstanding I/O to complete.
Fri Feb  4 12:16:49 GMT [ispfc_timeout_1:warning]: 8b.83 (0x01000053)
(0x034f5be0,0x2a:09b25298:0080,0/0,20153/0/0,87
ommand timeout, quiescing drive to allow outstanding I/O to complete.
Fri Feb  4 12:16:49 GMT [ispfc_timeout_1:warning]: 8b.82 (0x01000052)
(0x034f6350,0x2a:09b25418:0080,0/0,20104/0/0,87
ommand timeout, quiescing drive to allow outstanding I/O to complete.
Fri Feb  4 12:16:49 GMT [ispfc_timeout_1:warning]: 8b.81 (0x01000051)
(0x034f6ac0,0x2f:0025ac00:0400,0/0,20153/0/0,87
ommand timeout, quiescing drive to allow outstanding I/O to complete.
Fri Feb  4 12:16:52 GMT [ispfc_timeout_1:error]: 8b.85 (0x01000055):
global device timer timeout, initiating device r
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.92:
Command aborted by host adapter: HA status 0x4:
a:09b25698:0080.
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.92:
Command aborted by host adapter: HA status 0x4:
f:0025ac00:0400.
Fri Feb  4 12:16:52 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.92:
request failed after try #1: cdb 0x2a:09b25618
Fri Feb  4 12:16:52 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.92:
request failed after try #1: cdb 0x2a:09b25518
Fri Feb  4 12:16:52 GMT [ispfc_timeout_1:error]: 8b.84 (0x01000054):
global device timer timeout, initiating device r
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.85:
Command aborted by host adapter: HA status 0x4:
f:0025a800:0400.
Fri Feb  4 12:16:52 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.85:
request failed after try #1: cdb 0x2a:09b25098
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.85:
Command aborted by host adapter: HA status 0x4:
8:109ed500:0040.
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.85:
Command aborted by host adapter: HA status 0x4:
a:09b25218:0080.
Fri Feb  4 12:16:52 GMT [ispfc_timeout_1:error]: 8b.83 (0x01000053):
global device timer timeout, initiating device r
Fri Feb  4 12:16:52 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.84:
request failed after try #1: cdb 0x2f:0025a800
Fri Feb  4 12:16:52 GMT [ispfc_timeout_1:error]: 8b.82 (0x01000052):
global device timer timeout, initiating device r
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.83:
Command aborted by host adapter: HA status 0x4:
a:09b25298:0080.
Fri Feb  4 12:16:52 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.83:
request failed after try #1: cdb 0x2a:09b25198
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.83:
Command aborted by host adapter: HA status 0x4:
f:0025ac00:0400.
Fri Feb  4 12:16:52 GMT [ispfc_timeout_1:error]: 8b.81 (0x01000051):
global device timer timeout, initiating device r
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.82:
Command aborted by host adapter: HA status 0x4:
f:0025ac00:0400.
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.82:
Command aborted by host adapter: HA status 0x4:
a:09b25398:0080.
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.82:
Command aborted by host adapter: HA status 0x4:
a:09b25418:0080.
Fri Feb  4 12:16:52 GMT [scsi.cmd.abortedByHost:error]: Device 8b.81:
Command aborted by host adapter: HA status 0x4:
f:0025ac00:0400.
Fri Feb  4 12:16:54 GMT [ispfc_timeout_1:warning]: 8b.80 (0x01000050)
(0x034f58b0,0x1c,0/0,56193/0/0,8745/0): command
, quiescing drive to allow outstanding I/O to complete.
Fri Feb  4 12:17:00 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.80:
request failed after try #1: cdb 0x1c.
Fri Feb  4 12:17:08 GMT [scsi.cmd.retrySuccess:info]: Device 8b.85:
request successful after retry #1: cdb 0x28:109ed
.
Fri Feb  4 12:17:25 GMT [scsi.cmd.retrySuccess:info]: Device 8b.82:
request successful after retry #1: cdb 0x2a:09b25
.
Fri Feb  4 12:17:25 GMT [scsi.cmd.pastTimeToLive:error]: Device 8b.89:
request failed after try #1: cdb 0x28:10bc5c38
Fri Feb  4 12:17:25 GMT [scsi.cmd.checkCondition:error]: Device 8b.83:
Check Condition: CDB 0x2a:09b25298:0080: Sense
SI:aborted command - Fibre Channel frame CRC error (0xb - 0x47 0x0
0x3)(42100).
F
PANIC: raid volfsm: vol boot: fatal multi-disk error. in process
config_thread on release NetApp Release 6.4.5 on Fri
12:17:25 GMT 2005