Chris,
I've gone back and researched this some more, and you are right on the money - all these disappearing disks have been 18 Gb ones. Contrary to my prior message, all our 36 Gb disk failures have done so in the expected manner - autosupport sent out and the disk marked as failed on the filer.
Since the 18 Gb are EOL, I don't expect much effort being made to fix the firmware. The concern would be that Netapp no longer sells 18 Gb drives, you can only get them through support. I can only hope their supply does not run out before we upgrade/swap out the 18's.
--sam
-----Original Message-----
From: Chris Blackmor [mailto:chris.blackmor@amd.com]
Sent: Tuesday, April 23, 2002 9:47 AM
To: Sam Schorr
Cc: Server Team; toasters
Subject: Re: disappearing disks
Sam,
If these are 18G drives they have a known failure problem where they just
spin down. I have lost 6 in the past month (but with 22 filers I am not
concerned - eat all you want... they'll make more). Anyway, with FC disks
you can run (as root) the following command and if you see an XXX in any of
the disk spots the drive is screwed and needs to be replaced.
> remsh m5 fcadmin device_map
Loop Map for channel 0a:
Translated Map: Port Count 1
7
Shelf mapping:
Loop Map for channel 1:
Translated Map: Port Count 57
7 0 1 2 3 8 9 10 11 16 17 18 19 24 25 26
27 32 33 34 35 40 41 42 43 48 49 50 51 56 57 58
59 60 61 62 52 53 54 44 45 46 36 37 38 28 29 30
20 21 22 12 13 14 4 5 6
Shelf mapping:
Shelf 0: 6 5 4 3 2 1 0
Shelf 1: 14 13 12 11 10 9 8
Shelf 2: 22 21 20 19 18 17 16
Shelf 3: 30 XXX 28 27 26 25 24
Shelf 4: 38 37 36 35 34 33 32
Shelf 5: 46 45 44 43 42 41 40
Shelf 6: 54 53 52 51 50 49 48
Shelf 7: 62 61 60 59 58 57 56
On Mon, Apr 22, 2002 at 08:24:49AM -0700, Sam Schorr wrote:
> We are running 6.1R1 and we have seen a frequent number of disks suddenly disappear without "failing" in the sense that the filer marks the disk as "failed". What we see is the disk is noticed as missing, there is a message to that effect in /etc/messages, but the disk does not show as failed in sysconfig -r nor does it show in sysconfig -d. If the disk is pulled and a new disk inserted, a "failed" message appears first, then the new disk is added as a spare. The original disk that went missing did have its data spared out.
>
> We have had to write our own monitoring scripts to pull disk counts from the MIB's so that this condition can be noticed right away. Netapp support says that this "never happens" but we have the /etc/messages files to show the problem. It may be fixed in 6.2?
>
> -----Original Message-----
> From: Geoff Hardin [mailto:geoff.hardin@dalsemi.com]
> Sent: Monday, April 22, 2002 6:50 AM
> To: toasters
> Subject: disappearing disks
>
>
> Fellow toasters;
> I have an F760 cluster running NetApp Release 6.1R1P1. In the past
> week, we have "lost" two disks on separate shelves. The disks seem to
> disappear from the filer and do not show up. All the disks are Seagate
> ST318203FC 18GB drives with firmware NA10. I've seen this happen before
> on spare disks, and the first disk we lost this week was a spare.
> Typically, the spare fails and just stops reporting; if you slip it into
> a different slot the disk reports as failed. No big deal, just a spare
> failing and the filer doesn't know what to do with it immediately.
> But yesterday, we lost a data disk; it actually didn't show up in the
> weekly cluster notification log that runs at midnight. Around 2pm we
> received a disk fail alert for the drive and a disk/shelf miscount
> error. While checking on this, I noticed a third disk, another spare,
> had "disappeared"; however, once the volume rebuild completed, this disk
> "reappeared."
> I was wondering if anyone else had seen similar behavior on their
> filers? Like I said, this is a cluster, and it's partner was still able
> to see the third disk that "disappeared", which leads me to believe I
> have an FC-AL adapter failing. All three disks have been on separate
> shelves, which also leads me away from suspecting an LRC (that would be
> too easy). Before I go tearing into the filer though, I wanted to see
> if anyone else had experience with this problem.
>
> Thanks,
>
> Geoff Hardin
> geoff.hardin(a)dalsemi.com
>
> "A one-question geek test: Seen on a California license plate on a VW
> Beetle: 'FEATURE'..." - Joshua D. Wachs
--
-----------------------------------------------------------------------------
* | *
* Chris Blackmor _______ | Good judgment comes from *
* Advanced Micro Devices \____ | | experience *
* Phone: (512) 602-1608 /| | | | And a lot of that comes *
* Fax: (512) 602-5155 | |___| | | from *
* Email: chris.blackmor(a)amd.com |____/ \| | bad judgment! *
* | Author Unknown*
-----------------------------------------------------------------------------
* My comments are mine, and mine alone. *
-----------------------------------------------------------------------------