But this is totally unacceptable!
Who else is putting up with this!?


On Feb 24, 2012, at 1:20 AM, Jacek wrote:

On 2012-02-24 08:05, Fletcher Cocquyt wrote:
We've discovered a couple of these bypassed disk conditions via the flashing amber light - but this was noticed totally out of band with normal support

Each time we opened a case manually and Netapp immediately sent out a disk replacement.
So why is a bypassed disk not treated as a failed disk ?  This kind of silent failure (in terms of Netapp monitoring and alerts) in a lights out datacenter seems negligent.

Message logged on syslog server:
esh.bypass.err.disk:error]: Disk 4d.49 on channels 4d/PARTNER disk shelf ID 3 ESH A bay 1 Bypassed due to the drive self bypass.

In my previous job I worked as NAS admin managing about 100 filers. I had long discussions with NetApp but it looks like they do not understand the problem:
- Why the disk is bypassed?
- Because it achieved threshold of errors and it was pro-actively removed from the disk pool.
- So it was actually failed and should be replaced. Why is it not marked as failed and filer status does not reflect it?
- Because the disk is not failed. It is bypassed.
...

And so on...


BTW: I read the KB article on bypassed disks and ran the CMD to highlight BYP but it did not show BYP
https://kb.netapp.com/support/index?page=content&id=3012395

We maintained our own script that collected data from several commands to be aware of any type of disk problems. It always picked up bypassed disks even if it was not marked as BYP.

We observed that number of all disk problems decreased when we started to use Disk Maintenance Center however sometimes we had to start disk tests manually.

Best regards,

Jacek