On 2012-02-24 08:05, Fletcher Cocquyt wrote:
We've discovered a couple of these
bypassed disk conditions via the flashing amber light -
but this was noticed totally out of band with normal
support
Each time we opened a case manually
and Netapp immediately sent out a disk replacement.
So why is a bypassed disk not
treated as a failed disk ? This kind of silent failure
(in terms of Netapp monitoring and alerts) in a lights out
datacenter seems negligent.
Message logged on syslog server:
esh.bypass.err.disk:error]: Disk
4d.49 on channels 4d/PARTNER disk shelf ID 3 ESH A bay 1
Bypassed due to the drive self bypass.
In my previous job I worked as NAS admin managing about 100
filers. I had long discussions with NetApp but it looks like
they do not understand the problem:
- Why the disk is bypassed?
- Because it achieved threshold of errors and it was
pro-actively removed from the disk pool.
- So it was actually failed and should be replaced. Why is
it not marked as failed and filer status does not reflect
it?
- Because the disk is not failed. It is bypassed.
...
And so on...
BTW: I read the KB article on
bypassed disks and ran the CMD to highlight BYP but it did
not show BYP
https://kb.netapp.com/support/index?page=content&id=3012395
We maintained our own script that collected data from
several commands to be aware of any type of disk problems.
It always picked up bypassed disks even if it was not marked
as BYP.
We observed that number of all disk problems decreased when
we started to use Disk Maintenance Center however sometimes
we had to start disk tests manually.
Best regards,
Jacek