We had an incident the other day where we lost one of our two paths to storage. We noticed the path failure quite by accident, and some time after it actually happened.
Does anyone know whether there's a way that the NTAP fibre DSM can be configured to generate an alert (by SNMP or e-mail) whenever a path fails?
Jon Hill wrote:
We had an incident the other day where we lost one of our two paths to storage. We noticed the path failure quite by accident, and some time after it actually happened.
Does anyone know whether there's a way that the NTAP fibre DSM can be configured to generate an alert (by SNMP or e-mail) whenever a path fails?
what failed, port on the host, port on the head port on the switch ?
This was a legacy storage array (XP-512), and the problem has yet to be isolated, but we believe it was a port on the head. In this case every component except the client reports full functionality - the XP says its ports are in great shape and the Brocade sees F-Ports for both the storage array and the HBA.
I'm hoping there's a way for the client to warn me of path failures since it doesn't really matter what the head and the switch say if the client loses comm.
Thanks.
-----Original Message----- From: Bluezman [mailto:lists@up-south.com] Sent: Tuesday, April 08, 2008 9:17 AM To: Jon Hill Cc: toasters@mathworks.com Subject: Re: MPIO and path failure
Jon Hill wrote:
We had an incident the other day where we lost one of our two paths to
storage. We noticed the path failure quite by accident, and some time
after it actually happened.
Does anyone know whether there's a way that the NTAP fibre DSM can be configured to generate an alert (by SNMP or e-mail) whenever a path
fails?
what failed, port on the host, port on the head port on the switch ?
Paths are merely a route **TO** the filer, but rather something **ON** the filer.
Therefore, that's an environmental issue external to the filer (rather than something actually on the filer). So you would need to investigate this with your switch vendor, etc.
Do you perhaps have some indication that something on the filer failed and caused this?
Stetson M. Webster Onsite Professional Services Engineer PS - North Amer. - East
NetApp 919.250.0052 Mobile Stetson.Webster@netapp.com www.netapp.com http://www.netapp.com/
________________________________
From: Jon Hill [mailto:JHill@jennison.com] Sent: Tuesday, April 08, 2008 8:06 AM To: toasters@mathworks.com Subject: MPIO and path failure
We had an incident the other day where we lost one of our two paths to storage. We noticed the path failure quite by accident, and some time after it actually happened.
Does anyone know whether there's a way that the NTAP fibre DSM can be configured to generate an alert (by SNMP or e-mail) whenever a path fails?
I should clarify that in this case the storage array was an old HP XP-512 that we are migrating away from. But it made us realize that we didn't have any mechanism in place to alert us when a host (on this array or on NTAP) loses one of its paths.
The latest revs of HP's client-side MPIO application for the XP (it's called auto path and I doubt very strongly that it would work with an NTAP array) does have its own SNMP MIB and even an e-mail alerting feature. I was hoping that the NTAP MPIO DSM might have something similar.
To answer your question, we do believe the head is to blame because we've swapped out every other component - HBAs, fibre cables, Brocade switches. And we don't think it's a software issue because the HBA that connects to the bad path has no comm even when configuring the boot BIOS during the POST. However, the diagnostics dumps that the backline support engineers generated on the head show everything as fine, so this is one of those fairly subtle issues where the client can't communicate over one path but every component is reporting a status of OK. Hence the interest in a client-side utility.
I did find that I could run c:\Program Files\NetApp\mpio\dsmcli path list to generate a (barely) human-readable list of paths, but it would take some effort to automate and I figured if the DSM is already gathering this data then it may have another interface for retrieving it.
________________________________
From: Webster, Stetson [mailto:Stetson.Webster@netapp.com] Sent: Tuesday, April 08, 2008 10:08 AM To: Jon Hill; toasters@mathworks.com Subject: RE: MPIO and path failure
Paths are merely a route **TO** the filer, but rather something **ON** the filer.
Therefore, that's an environmental issue external to the filer (rather than something actually on the filer). So you would need to investigate this with your switch vendor, etc.
Do you perhaps have some indication that something on the filer failed and caused this?
Stetson M. Webster Onsite Professional Services Engineer PS - North Amer. - East
NetApp 919.250.0052 Mobile Stetson.Webster@netapp.com www.netapp.com http://www.netapp.com/
________________________________
From: Jon Hill [mailto:JHill@jennison.com] Sent: Tuesday, April 08, 2008 8:06 AM To: toasters@mathworks.com Subject: MPIO and path failure
We had an incident the other day where we lost one of our two paths to storage. We noticed the path failure quite by accident, and some time after it actually happened.
Does anyone know whether there's a way that the NTAP fibre DSM can be configured to generate an alert (by SNMP or e-mail) whenever a path fails?
For NetApp, be sure to review the NetApp Host Utilities software downloadable from the NOW site. If your host OS supports it, I recommend using ALUA.
Cheers ...........
Stetson M. Webster Onsite Professional Services Engineer PS - North Amer. - East
NetApp 919.250.0052 Mobile Stetson.Webster@netapp.com www.netapp.com http://www.netapp.com/
________________________________
From: Jon Hill [mailto:JHill@jennison.com] Sent: Tuesday, April 08, 2008 10:23 AM To: Webster, Stetson; toasters@mathworks.com Subject: RE: MPIO and path failure
I should clarify that in this case the storage array was an old HP XP-512 that we are migrating away from. But it made us realize that we didn't have any mechanism in place to alert us when a host (on this array or on NTAP) loses one of its paths.
The latest revs of HP's client-side MPIO application for the XP (it's called auto path and I doubt very strongly that it would work with an NTAP array) does have its own SNMP MIB and even an e-mail alerting feature. I was hoping that the NTAP MPIO DSM might have something similar.
To answer your question, we do believe the head is to blame because we've swapped out every other component - HBAs, fibre cables, Brocade switches. And we don't think it's a software issue because the HBA that connects to the bad path has no comm even when configuring the boot BIOS during the POST. However, the diagnostics dumps that the backline support engineers generated on the head show everything as fine, so this is one of those fairly subtle issues where the client can't communicate over one path but every component is reporting a status of OK. Hence the interest in a client-side utility.
I did find that I could run c:\Program Files\NetApp\mpio\dsmcli path list to generate a (barely) human-readable list of paths, but it would take some effort to automate and I figured if the DSM is already gathering this data then it may have another interface for retrieving it.
________________________________
From: Webster, Stetson [mailto:Stetson.Webster@netapp.com] Sent: Tuesday, April 08, 2008 10:08 AM To: Jon Hill; toasters@mathworks.com Subject: RE: MPIO and path failure
Paths are merely a route **TO** the filer, but rather something **ON** the filer.
Therefore, that's an environmental issue external to the filer (rather than something actually on the filer). So you would need to investigate this with your switch vendor, etc.
Do you perhaps have some indication that something on the filer failed and caused this?
Stetson M. Webster Onsite Professional Services Engineer PS - North Amer. - East
NetApp 919.250.0052 Mobile Stetson.Webster@netapp.com www.netapp.com http://www.netapp.com/
________________________________
From: Jon Hill [mailto:JHill@jennison.com] Sent: Tuesday, April 08, 2008 8:06 AM To: toasters@mathworks.com Subject: MPIO and path failure
We had an incident the other day where we lost one of our two paths to storage. We noticed the path failure quite by accident, and some time after it actually happened.
Does anyone know whether there's a way that the NTAP fibre DSM can be configured to generate an alert (by SNMP or e-mail) whenever a path fails?
You have a couple options open to you, that may or may not fit your needs.
1. While DSM does not have any alerting built-in, Snapdrive does. While it may not give you an email alert when a path fails, it will give you other storage alerts which are more impacting than a single path failing. I do like your idea of pulling from the command line, but it sounds pretty heavy for any environment larger than 20 hosts. What I would do is create the command line output, parse it for failures, and email out only on failures a few times a day (don't forget to share the script ;)
2. With Brocades you can configure an email address for administrator, which will email out when a link goes down. While it won't show you a problem with zoning causing a path to fail, it will show you when a link / gbic fails.
3. With Netapp Operations Manager / DFM you can configure custom alerts on various events on the Filer, these include: HBA Port: Offline, HBA Port: Port Error, HBA Port: Traffic High. Keep in mind if you have alerting on all events "Critical or worse" already set up, Netapp has some items missing from that list. I am pretty sure the HBA errors are not included in "critical", just like "lun offline" is not a critical alert either.
I think to get exactly what you want you should start with #1. It sounds like alerting is really important in your environment (I suppose it is to everyone) so you may want to do #2 since it is free, and I would push for Ops Mgr as well too. To Stetson's point as well, the Host Attach Kit also includes other command line utilities that may be helpful in your scripting adventures, and it is required to be installed for any supportable Netapp config.
HTH,
Hadrian
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Webster, Stetson Sent: Tuesday, April 08, 2008 7:29 AM To: Jon Hill; toasters@mathworks.com Subject: RE: MPIO and path failure
For NetApp, be sure to review the NetApp Host Utilities software downloadable from the NOW site. If your host OS supports it, I recommend using ALUA.
Cheers ...........
Stetson M. Webster Onsite Professional Services Engineer PS - North Amer. - East
NetApp 919.250.0052 Mobile Stetson.Webster@netapp.com mailto:Stetson.Webster@netapp.comwww.netapp.comhttp://www.netapp.com/ [cid:image001.png@01C89958.14C70C20]
________________________________ From: Jon Hill [mailto:JHill@jennison.com] Sent: Tuesday, April 08, 2008 10:23 AM To: Webster, Stetson; toasters@mathworks.com Subject: RE: MPIO and path failure I should clarify that in this case the storage array was an old HP XP-512 that we are migrating away from. But it made us realize that we didn't have any mechanism in place to alert us when a host (on this array or on NTAP) loses one of its paths.
The latest revs of HP's client-side MPIO application for the XP (it's called auto path and I doubt very strongly that it would work with an NTAP array) does have its own SNMP MIB and even an e-mail alerting feature. I was hoping that the NTAP MPIO DSM might have something similar.
To answer your question, we do believe the head is to blame because we've swapped out every other component - HBAs, fibre cables, Brocade switches. And we don't think it's a software issue because the HBA that connects to the bad path has no comm even when configuring the boot BIOS during the POST. However, the diagnostics dumps that the backline support engineers generated on the head show everything as fine, so this is one of those fairly subtle issues where the client can't communicate over one path but every component is reporting a status of OK. Hence the interest in a client-side utility.
I did find that I could run c:\Program Files\NetApp\mpio\dsmcli path list to generate a (barely) human-readable list of paths, but it would take some effort to automate and I figured if the DSM is already gathering this data then it may have another interface for retrieving it.
________________________________ From: Webster, Stetson [mailto:Stetson.Webster@netapp.com] Sent: Tuesday, April 08, 2008 10:08 AM To: Jon Hill; toasters@mathworks.com Subject: RE: MPIO and path failure Paths are merely a route **TO** the filer, but rather something **ON** the filer.
Therefore, that's an environmental issue external to the filer (rather than something actually on the filer). So you would need to investigate this with your switch vendor, etc.
Do you perhaps have some indication that something on the filer failed and caused this?
Stetson M. Webster Onsite Professional Services Engineer PS - North Amer. - East
NetApp 919.250.0052 Mobile Stetson.Webster@netapp.com mailto:Stetson.Webster@netapp.comwww.netapp.comhttp://www.netapp.com/ [cid:image001.png@01C89958.14C70C20]
________________________________ From: Jon Hill [mailto:JHill@jennison.com] Sent: Tuesday, April 08, 2008 8:06 AM To: toasters@mathworks.com Subject: MPIO and path failure
We had an incident the other day where we lost one of our two paths to storage. We noticed the path failure quite by accident, and some time after it actually happened.
Does anyone know whether there's a way that the NTAP fibre DSM can be configured to generate an alert (by SNMP or e-mail) whenever a path fails?