We are running 6.1R1 and we have seen a frequent number of disks suddenly disappear without "failing" in the sense that the filer marks the disk as "failed". What we see is the disk is noticed as missing, there is a message to that effect in /etc/messages, but the disk does not show as failed in sysconfig -r nor does it show in sysconfig -d. If the disk is pulled and a new disk inserted, a "failed" message appears first, then the new disk is added as a spare. The original disk that went missing did have its data spared out.
We have had to write our own monitoring scripts to pull disk counts from the MIB's so that this condition can be noticed right away. Netapp support says that this "never happens" but we have the /etc/messages files to show the problem. It may be fixed in 6.2?
-----Original Message-----
From: Geoff Hardin [mailto:geoff.hardin@dalsemi.com]
Sent: Monday, April 22, 2002 6:50 AM
To: toasters
Subject: disappearing disks
Fellow toasters;
I have an F760 cluster running NetApp Release 6.1R1P1. In the past
week, we have "lost" two disks on separate shelves. The disks seem to
disappear from the filer and do not show up. All the disks are Seagate
ST318203FC 18GB drives with firmware NA10. I've seen this happen before
on spare disks, and the first disk we lost this week was a spare.
Typically, the spare fails and just stops reporting; if you slip it into
a different slot the disk reports as failed. No big deal, just a spare
failing and the filer doesn't know what to do with it immediately.
But yesterday, we lost a data disk; it actually didn't show up in the
weekly cluster notification log that runs at midnight. Around 2pm we
received a disk fail alert for the drive and a disk/shelf miscount
error. While checking on this, I noticed a third disk, another spare,
had "disappeared"; however, once the volume rebuild completed, this disk
"reappeared."
I was wondering if anyone else had seen similar behavior on their
filers? Like I said, this is a cluster, and it's partner was still able
to see the third disk that "disappeared", which leads me to believe I
have an FC-AL adapter failing. All three disks have been on separate
shelves, which also leads me away from suspecting an LRC (that would be
too easy). Before I go tearing into the filer though, I wanted to see
if anyone else had experience with this problem.
Thanks,
Geoff Hardin
geoff.hardin(a)dalsemi.com
"A one-question geek test: Seen on a California license plate on a VW
Beetle: 'FEATURE'..." - Joshua D. Wachs