I agree with Stephane. We have seen this problem hundreds of times with
FC7's and FC8's. The best answer would be to replace the shelves with FC9's
if your disks were all X221C and X221D or upgrade to DS14's and new disks.
NetApp (Eurologic) made great leaps in reliability between the FC7/FC8
(XL400) and FC9 (XL500) series. Seeing as how you have some X221A's or
X221B's, you would want to replace the shelves with different FC8's or fix
the problem with the shelves. The LRC and backplane are the most likely
culprits. There is also a chance that your fibre channel card is faulty. I
would assume that you are using X2030B's in those. You can replace the
fibre channel card, LRC's and VEM's very easily and that is very
non-intrusive to your system. If that doesn't work, you are looking at
backplane issues which are much more difficult to remedy. You can test the
shelf/components in question using a hardware fibre channel analyzer or a
software fibre channel capture utility. Using a 'disktest' in maintenance
mode is not very useful for detecting these sorts of problems. The likely
reason why you notice this in spare disks more than disks in raid groups is
that you are replacing disks over time in the bad slots. The disks you
insert become spares but are still in a slot that has already failed a disk.
Those bad slots might just be failing replacement spare disks over and over.
Hope this helps.
Tim
-----Original Message-----
From: Stephane [mailto:stephane.bentebba@fps.fr]
Sent: Monday, November 15, 2004 2:34 AM
Cc: toasters(a)mathworks.com
Subject: Re: Problems with old 740 cluster...
i already got some hard trouble w/ FC7/FC8 shelf
numerous error on disk (disapear from the loop) over a long period
to fix that, the best is to change the shelf for a good FC9
FC9 seem to be more stable than FC7/FC8 one
i can't tell you why, but i worked great for me
you could also try to solelly exchange the Environmental Module at the
rear which also hold the F connection
it could play a bad role in these error
try to move as many disk as you can to a shelf that have no (or less) pb
bye
Kelsey Cummings wrote:
>I consult for this company that has an old 740 cluster that is built out
>with FC8 shelves and primarily ST118202FC and ST318203FC disks with the
>ocassional X221 18GB gisks.
>
>One of the filers has 2 shelves, the other 8.
>
>The one with 8 shelves is having repeated problems with 'missing disks' -
>ie where a perfectly good disk just disapears from the loop. I think the
>missing disk problem may be localized to specific bays on specific shelves
>but in any event doesn't seem to follow the disk if it's moved. It also
>seems to affect spare disks more than active disks (I've kept as many
>spares hot as possible in the boxes due to the age of the disks) but this
>could just be an artifact of the behavior.
>
>It's a little before my time with NetApps but I seem to recal hearing that
>these kinds of problems were not uncommon on the FC8/FC9 shelves.
>
>Anyone have any suggestions as to how I can rectify the problems besides
>replacing the disks and shelves?
>
>-Kelsey
>
>
>