We've discovered a couple of these bypassed disk conditions via the flashing amber light - but this was noticed totally out of band with normal support
Each time we opened a case manually and Netapp immediately sent out a disk replacement. So why is a bypassed disk not treated as a failed disk ? This kind of silent failure (in terms of Netapp monitoring and alerts) in a lights out datacenter seems negligent.
Message logged on syslog server: esh.bypass.err.disk:error]: Disk 4d.49 on channels 4d/PARTNER disk shelf ID 3 ESH A bay 1 Bypassed due to the drive self bypass.
BTW: I read the KB article on bypassed disks and ran the CMD to highlight BYP but it did not show BYP https://kb.netapp.com/support/index?page=content&id=3012395
thanks,
Fletcher
On 2012-02-24 08:05, Fletcher Cocquyt wrote:
We've discovered a couple of these bypassed disk conditions via the flashing amber light - but this was noticed totally out of band with normal support
Each time we opened a case manually and Netapp immediately sent out a disk replacement. So why is a bypassed disk not treated as a failed disk ? This kind of silent failure (in terms of Netapp monitoring and alerts) in a lights out datacenter seems negligent.
Message logged on syslog server: esh.bypass.err.disk:error]: Disk 4d.49 on channels 4d/PARTNER disk shelf ID 3 ESH A bay 1 Bypassed due to the drive self bypass.
In my previous job I worked as NAS admin managing about 100 filers. I had long discussions with NetApp but it looks like they do not understand the problem: - Why the disk is bypassed? - Because it achieved threshold of errors and it was pro-actively removed from the disk pool. - So it was actually failed and should be replaced. Why is it not marked as failed and filer status does not reflect it? - Because the disk is not failed. It is bypassed. ...
And so on...
BTW: I read the KB article on bypassed disks and ran the CMD to highlight BYP but it did not show BYP https://kb.netapp.com/support/index?page=content&id=3012395
We maintained our own script that collected data from several commands to be aware of any type of disk problems. It always picked up bypassed disks even if it was not marked as BYP.
We observed that number of all disk problems decreased when we started to use Disk Maintenance Center however sometimes we had to start disk tests manually.
Best regards,
Jacek
But this is totally unacceptable! Who else is putting up with this!?
On Feb 24, 2012, at 1:20 AM, Jacek wrote:
On 2012-02-24 08:05, Fletcher Cocquyt wrote:
We've discovered a couple of these bypassed disk conditions via the flashing amber light - but this was noticed totally out of band with normal support
Each time we opened a case manually and Netapp immediately sent out a disk replacement. So why is a bypassed disk not treated as a failed disk ? This kind of silent failure (in terms of Netapp monitoring and alerts) in a lights out datacenter seems negligent.
Message logged on syslog server: esh.bypass.err.disk:error]: Disk 4d.49 on channels 4d/PARTNER disk shelf ID 3 ESH A bay 1 Bypassed due to the drive self bypass.
In my previous job I worked as NAS admin managing about 100 filers. I had long discussions with NetApp but it looks like they do not understand the problem:
- Why the disk is bypassed?
- Because it achieved threshold of errors and it was pro-actively removed from the disk pool.
- So it was actually failed and should be replaced. Why is it not marked as failed and filer status does not reflect it?
- Because the disk is not failed. It is bypassed.
...
And so on...
BTW: I read the KB article on bypassed disks and ran the CMD to highlight BYP but it did not show BYP https://kb.netapp.com/support/index?page=content&id=3012395
We maintained our own script that collected data from several commands to be aware of any type of disk problems. It always picked up bypassed disks even if it was not marked as BYP.
We observed that number of all disk problems decreased when we started to use Disk Maintenance Center however sometimes we had to start disk tests manually.
Best regards,
Jacek
If the same slot continues to show BYP with new disks, it is likely a bad shelf.
--tmac Tim McCarthy Principal Consultant
RedHat Certified Engineer 804006984323821 (RHEL4) 805007643429572 (RHEL5)
2012/2/24 Fletcher Cocquyt fcocquyt@stanford.edu:
But this is totally unacceptable! Who else is putting up with this!?
On Feb 24, 2012, at 1:20 AM, Jacek wrote:
On 2012-02-24 08:05, Fletcher Cocquyt wrote:
We've discovered a couple of these bypassed disk conditions via the flashing amber light - but this was noticed totally out of band with normal support
Each time we opened a case manually and Netapp immediately sent out a disk replacement.
So why is a bypassed disk not treated as a failed disk ? This kind of silent failure (in terms of Netapp monitoring and alerts) in a lights out datacenter seems negligent.
Message logged on syslog server:
esh.bypass.err.disk:error]: Disk 4d.49 on channels 4d/PARTNER disk shelf ID 3 ESH A bay 1 Bypassed due to the drive self bypass.
In my previous job I worked as NAS admin managing about 100 filers. I had long discussions with NetApp but it looks like they do not understand the problem:
- Why the disk is bypassed?
- Because it achieved threshold of errors and it was pro-actively removed
from the disk pool.
- So it was actually failed and should be replaced. Why is it not marked as
failed and filer status does not reflect it?
- Because the disk is not failed. It is bypassed.
...
And so on...
BTW: I read the KB article on bypassed disks and ran the CMD to highlight BYP but it did not show BYP
https://kb.netapp.com/support/index?page=content&id=3012395
We maintained our own script that collected data from several commands to be aware of any type of disk problems. It always picked up bypassed disks even if it was not marked as BYP.
We observed that number of all disk problems decreased when we started to use Disk Maintenance Center however sometimes we had to start disk tests manually.
Best regards,
Jacek
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Im of the mind that a BYP needs more attention that just a disk swap..but it does need more attention from NGS, it would appear.
On Fri, Feb 24, 2012 at 6:42 PM, tmac tmacmd@gmail.com wrote:
If the same slot continues to show BYP with new disks, it is likely a bad shelf.
--tmac Tim McCarthy Principal Consultant
RedHat Certified Engineer 804006984323821 (RHEL4) 805007643429572 (RHEL5)
2012/2/24 Fletcher Cocquyt fcocquyt@stanford.edu:
But this is totally unacceptable! Who else is putting up with this!?
On Feb 24, 2012, at 1:20 AM, Jacek wrote:
On 2012-02-24 08:05, Fletcher Cocquyt wrote:
We've discovered a couple of these bypassed disk conditions via the
flashing
amber light - but this was noticed totally out of band with normal
support
Each time we opened a case manually and Netapp immediately sent out a
disk
replacement.
So why is a bypassed disk not treated as a failed disk ? This kind of silent failure (in terms of Netapp monitoring and alerts) in a lights out datacenter seems negligent.
Message logged on syslog server:
esh.bypass.err.disk:error]: Disk 4d.49 on channels 4d/PARTNER disk shelf
ID
3 ESH A bay 1 Bypassed due to the drive self bypass.
In my previous job I worked as NAS admin managing about 100 filers. I had long discussions with NetApp but it looks like they do not understand the problem:
- Why the disk is bypassed?
- Because it achieved threshold of errors and it was pro-actively removed
from the disk pool.
- So it was actually failed and should be replaced. Why is it not marked
as
failed and filer status does not reflect it?
- Because the disk is not failed. It is bypassed.
...
And so on...
BTW: I read the KB article on bypassed disks and ran the CMD to highlight BYP but it did not show BYP
https://kb.netapp.com/support/index?page=content&id=3012395
We maintained our own script that collected data from several commands
to be
aware of any type of disk problems. It always picked up bypassed disks
even
if it was not marked as BYP.
We observed that number of all disk problems decreased when we started to use Disk Maintenance Center however sometimes we had to start disk tests manually.
Best regards,
Jacek
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
I bought my first NetApp in 1995 and I've never seen anything like this with one exception - a DS14 with a bad backplane. It did take a while to diagnose because nobody ever saw a backplane fail like that before.
On 02/24/2012 09:59 AM, Fletcher Cocquyt wrote:
But this is totally unacceptable! Who else is putting up with this!?
On Feb 24, 2012, at 1:20 AM, Jacek wrote:
On 2012-02-24 08:05, Fletcher Cocquyt wrote:
We've discovered a couple of these bypassed disk conditions via the flashing amber light - but this was noticed totally out of band with normal support
Each time we opened a case manually and Netapp immediately sent out a disk replacement. So why is a bypassed disk not treated as a failed disk ? This kind of silent failure (in terms of Netapp monitoring and alerts) in a lights out datacenter seems negligent.
Message logged on syslog server: esh.bypass.err.disk:error]: Disk 4d.49 on channels 4d/PARTNER disk shelf ID 3 ESH A bay 1 Bypassed due to the drive self bypass.
In my previous job I worked as NAS admin managing about 100 filers. I had long discussions with NetApp but it looks like they do not understand the problem:
- Why the disk is bypassed?
- Because it achieved threshold of errors and it was pro-actively
removed from the disk pool.
- So it was actually failed and should be replaced. Why is it not
marked as failed and filer status does not reflect it?
- Because the disk is not failed. It is bypassed.
...
And so on...
BTW: I read the KB article on bypassed disks and ran the CMD to highlight BYP but it did not show BYP https://kb.netapp.com/support/index?page=content&id=3012395 https://kb.netapp.com/support/index?page=content&id=3012395
We maintained our own script that collected data from several commands to be aware of any type of disk problems. It always picked up bypassed disks even if it was not marked as BYP.
We observed that number of all disk problems decreased when we started to use Disk Maintenance Center however sometimes we had to start disk tests manually.
Best regards,
Jacek
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Please consider updating disk and shelf firmware. In my experience, this has solved 100% of BYP disk conditions, although I'm sure there are exceptions. It's highly likely that this problem has already been fixed with disk and/or shelf firmware. I'd pick that low hanging fruit first because ESH firmware updates are usually non-disruptive.
Sent from my iThumbs. Please pardon my thumbmanship.
On Feb 26, 2012, at 11:48 AM, "jfs" <jfsinmsp@gmail.commailto:jfsinmsp@gmail.com> wrote:
I bought my first NetApp in 1995 and I've never seen anything like this with one exception - a DS14 with a bad backplane. It did take a while to diagnose because nobody ever saw a backplane fail like that before.
On 02/24/2012 09:59 AM, Fletcher Cocquyt wrote: But this is totally unacceptable! Who else is putting up with this!?
On Feb 24, 2012, at 1:20 AM, Jacek wrote:
On 2012-02-24 08:05, Fletcher Cocquyt wrote: We've discovered a couple of these bypassed disk conditions via the flashing amber light - but this was noticed totally out of band with normal support
Each time we opened a case manually and Netapp immediately sent out a disk replacement. So why is a bypassed disk not treated as a failed disk ? This kind of silent failure (in terms of Netapp monitoring and alerts) in a lights out datacenter seems negligent.
Message logged on syslog server: esh.bypass.err.disk:error]: Disk 4d.49 on channels 4d/PARTNER disk shelf ID 3 ESH A bay 1 Bypassed due to the drive self bypass.
In my previous job I worked as NAS admin managing about 100 filers. I had long discussions with NetApp but it looks like they do not understand the problem: - Why the disk is bypassed? - Because it achieved threshold of errors and it was pro-actively removed from the disk pool. - So it was actually failed and should be replaced. Why is it not marked as failed and filer status does not reflect it? - Because the disk is not failed. It is bypassed. ...
And so on...
BTW: I read the KB article on bypassed disks and ran the CMD to highlight BYP but it did not show BYP https://kb.netapp.com/support/index?page=content&id=3012395
We maintained our own script that collected data from several commands to be aware of any type of disk problems. It always picked up bypassed disks even if it was not marked as BYP.
We observed that number of all disk problems decreased when we started to use Disk Maintenance Center however sometimes we had to start disk tests manually.
Best regards,
Jacek
_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters