Hi,
So I've been testing cDOT monitoring for a while now. I'm using OCUM as my primary source of alerts, but also deploying NetApp SDK based checks in nagios.
Here is what I noticed on (almost) "live" 6 nodes cDOT cluster, autosupport / OCUM configured and tested:
1. Take a cDOT node in Multi-Path HA / ACP Full Connectivity config (I'd guess this is what most people run).
2. Take a random disk shelf configured on the node in the cluster, it was DS2246 SAS 10K RPM 1.2TB disks in my case.
3. Remove one PSU on the disk shelf.
The results:
1. It does not generate Autosupport message about the PSU missing. 2. OCUM doesn't trigger an alert.
The only sub-system that noticed the issue is 'event':
Time Node Severity Event ------------------- ---------------- ------------- --------------------------- 4/30/2015 12:11:01 na101node-1a WARNING ses.status.psWarning: DS2246 (S/N SHFHU1427000502) shelf 20 on channel 0b power warning for Power supply 1: not installed. This module is on the rear of the shelf at the bottom left.
But it didn't go any further.
Has anyone else seen this behavior? I wonder if I'm missing some settings etc ..
Cheers, Vladimir
there are some tool available onboard, event route and event destination. They allow you to route alerts via a mail server, and you can be quite detailed about the type of alert you want to receive.
event config will help set up the mail side event destination will let you send stuff to that mailer Event route will allow you to select the alerts that get transported across your previous event work :)
Hope this helps, and more info if you need it.
~Mark
mark.flint@sanger.ac.uk
On 30 Apr 2015, at 17:23, Momonth momonth@gmail.com wrote:
Hi,
So I've been testing cDOT monitoring for a while now. I'm using OCUM as my primary source of alerts, but also deploying NetApp SDK based checks in nagios.
Here is what I noticed on (almost) "live" 6 nodes cDOT cluster, autosupport / OCUM configured and tested:
- Take a cDOT node in Multi-Path HA / ACP Full Connectivity config
(I'd guess this is what most people run).
- Take a random disk shelf configured on the node in the cluster, it
was DS2246 SAS 10K RPM 1.2TB disks in my case.
- Remove one PSU on the disk shelf.
The results:
- It does not generate Autosupport message about the PSU missing.
- OCUM doesn't trigger an alert.
The only sub-system that noticed the issue is 'event':
Time Node Severity Event
4/30/2015 12:11:01 na101node-1a WARNING ses.status.psWarning: DS2246 (S/N SHFHU1427000502) shelf 20 on channel 0b power warning for Power supply 1: not installed. This module is on the rear of the shelf at the bottom left.
But it didn't go any further.
Has anyone else seen this behavior? I wonder if I'm missing some settings etc ..
Cheers, Vladimir _______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Not sure if it matters in this case, but are you using ACP cabling along with the SAS cables? ACP used to be treated as optional but from 8.2 onwards is mandatory for alerts on SAS shelves to work properly.
Make sure that your SP is configured per controller as well, it needs to be functioning with HTTPS outbound open to actually send ASUPs out of the system. Configuration of the SP is a node shell command in CDOT.
SP setup procedure is detailed in the linked KB article: https://kb.netapp.com/index?page=content&id=3012997&actp=LIST_POPULA...
Anthony Bar | tbar@berkcom.commailto:tbar@berkcom.com Berkeley Communications | www.berkcom.comhttp://www.berkcom.com/
On May 1, 2015, at 1:32 AM, Mark Flint <mf1@sanger.ac.ukmailto:mf1@sanger.ac.uk> wrote:
there are some tool available onboard, event route and event destination. They allow you to route alerts via a mail server, and you can be quite detailed about the type of alert you want to receive.
event config will help set up the mail side event destination will let you send stuff to that mailer Event route will allow you to select the alerts that get transported across your previous event work :)
Hope this helps, and more info if you need it.
~Mark
mark.flint@sanger.ac.ukmailto:mark.flint@sanger.ac.uk
On 30 Apr 2015, at 17:23, Momonth <momonth@gmail.commailto:momonth@gmail.com> wrote:
Hi,
So I've been testing cDOT monitoring for a while now. I'm using OCUM as my primary source of alerts, but also deploying NetApp SDK based checks in nagios.
Here is what I noticed on (almost) "live" 6 nodes cDOT cluster, autosupport / OCUM configured and tested:
1. Take a cDOT node in Multi-Path HA / ACP Full Connectivity config (I'd guess this is what most people run).
2. Take a random disk shelf configured on the node in the cluster, it was DS2246 SAS 10K RPM 1.2TB disks in my case.
3. Remove one PSU on the disk shelf.
The results:
1. It does not generate Autosupport message about the PSU missing. 2. OCUM doesn't trigger an alert.
The only sub-system that noticed the issue is 'event':
Time Node Severity Event ------------------- ---------------- ------------- --------------------------- 4/30/2015 12:11:01 na101node-1a WARNING ses.status.psWarning: DS2246 (S/N SHFHU1427000502) shelf 20 on channel 0b power warning for Power supply 1: not installed. This module is on the rear of the shelf at the bottom left.
But it didn't go any further.
Has anyone else seen this behavior? I wonder if I'm missing some settings etc ..
Cheers, Vladimir _______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Yes, the ACP is cabled / configured. We basically do the same thing as on 7M filers and DFM picks up PSU issues there. SPs are also configured.
I am going to open a ticket with NetApp on this issue.
Cheers, On May 1, 2015 17:16, "Tony Bar" tbar@berkcom.com wrote:
Not sure if it matters in this case, but are you using ACP cabling along with the SAS cables? ACP used to be treated as optional but from 8.2 onwards is mandatory for alerts on SAS shelves to work properly.
Make sure that your SP is configured per controller as well, it needs to be functioning with HTTPS outbound open to actually send ASUPs out of the system. Configuration of the SP is a node shell command in CDOT.
SP setup procedure is detailed in the linked KB article: https://kb.netapp.com/index?page=content&id=3012997&actp=LIST_POPULA...
Anthony Bar | tbar@berkcom.com Berkeley Communications | www.berkcom.com
On May 1, 2015, at 1:32 AM, Mark Flint mf1@sanger.ac.uk wrote:
there are some tool available onboard, event route and event destination. They allow you to route alerts via a mail server, and you can be quite detailed about the type of alert you want to receive.
event config will help set up the mail side event destination will let you send stuff to that mailer Event route will allow you to select the alerts that get transported across your previous event work :)
Hope this helps, and more info if you need it.
~Mark
mark.flint@sanger.ac.uk
On 30 Apr 2015, at 17:23, Momonth momonth@gmail.com wrote:
Hi,
So I've been testing cDOT monitoring for a while now. I'm using OCUM as my primary source of alerts, but also deploying NetApp SDK based checks in nagios.
Here is what I noticed on (almost) "live" 6 nodes cDOT cluster, autosupport / OCUM configured and tested:
- Take a cDOT node in Multi-Path HA / ACP Full Connectivity config
(I'd guess this is what most people run).
- Take a random disk shelf configured on the node in the cluster, it
was DS2246 SAS 10K RPM 1.2TB disks in my case.
- Remove one PSU on the disk shelf.
The results:
- It does not generate Autosupport message about the PSU missing.
- OCUM doesn't trigger an alert.
The only sub-system that noticed the issue is 'event':
Time Node Severity Event
4/30/2015 12:11:01 na101node-1a WARNING ses.status.psWarning: DS2246 (S/N SHFHU1427000502) shelf 20 on channel 0b power warning for Power supply 1: not installed. This module is on the rear of the shelf at the bottom left.
But it didn't go any further.
Has anyone else seen this behavior? I wonder if I'm missing some settings etc ..
Cheers, Vladimir _______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Thanks for the hint, I'm going to try it, it seems to be a plausible workaround.
Cheers, On May 1, 2015 10:33, "Mark Flint" mf1@sanger.ac.uk wrote:
there are some tool available onboard, event route and event destination. They allow you to route alerts via a mail server, and you can be quite detailed about the type of alert you want to receive.
event config will help set up the mail side event destination will let you send stuff to that mailer Event route will allow you to select the alerts that get transported across your previous event work :)
Hope this helps, and more info if you need it.
~Mark
mark.flint@sanger.ac.uk
On 30 Apr 2015, at 17:23, Momonth momonth@gmail.com wrote:
Hi,
So I've been testing cDOT monitoring for a while now. I'm using OCUM as my primary source of alerts, but also deploying NetApp SDK based checks in nagios.
Here is what I noticed on (almost) "live" 6 nodes cDOT cluster, autosupport / OCUM configured and tested:
- Take a cDOT node in Multi-Path HA / ACP Full Connectivity config
(I'd guess this is what most people run).
- Take a random disk shelf configured on the node in the cluster, it
was DS2246 SAS 10K RPM 1.2TB disks in my case.
- Remove one PSU on the disk shelf.
The results:
- It does not generate Autosupport message about the PSU missing.
- OCUM doesn't trigger an alert.
The only sub-system that noticed the issue is 'event':
Time Node Severity Event
4/30/2015 12:11:01 na101node-1a WARNING ses.status.psWarning: DS2246 (S/N SHFHU1427000502) shelf 20 on channel 0b power warning for Power supply 1: not installed. This module is on the rear of the shelf at the bottom left.
But it didn't go any further.
Has anyone else seen this behavior? I wonder if I'm missing some settings etc ..
Cheers, Vladimir _______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Shouldn’t this set the global status to “not normal”
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Momonth Sent: Monday, May 04, 2015 1:51 PM To: Mark Flint Cc: toasters@teaparty.net Subject: Re: cDOT 8.2.1P3: removing PSU from a disk shelf went unnoticed for OCUM / Autosupport
Thanks for the hint, I'm going to try it, it seems to be a plausible workaround.
Cheers, On May 1, 2015 10:33, "Mark Flint" <mf1@sanger.ac.ukmailto:mf1@sanger.ac.uk> wrote: there are some tool available onboard, event route and event destination. They allow you to route alerts via a mail server, and you can be quite detailed about the type of alert you want to receive.
event config will help set up the mail side event destination will let you send stuff to that mailer Event route will allow you to select the alerts that get transported across your previous event work :)
Hope this helps, and more info if you need it.
~Mark
mark.flint@sanger.ac.ukmailto:mark.flint@sanger.ac.uk
On 30 Apr 2015, at 17:23, Momonth <momonth@gmail.commailto:momonth@gmail.com> wrote:
Hi,
So I've been testing cDOT monitoring for a while now. I'm using OCUM as my primary source of alerts, but also deploying NetApp SDK based checks in nagios.
Here is what I noticed on (almost) "live" 6 nodes cDOT cluster, autosupport / OCUM configured and tested:
1. Take a cDOT node in Multi-Path HA / ACP Full Connectivity config (I'd guess this is what most people run).
2. Take a random disk shelf configured on the node in the cluster, it was DS2246 SAS 10K RPM 1.2TB disks in my case.
3. Remove one PSU on the disk shelf.
The results:
1. It does not generate Autosupport message about the PSU missing. 2. OCUM doesn't trigger an alert.
The only sub-system that noticed the issue is 'event':
Time Node Severity Event ------------------- ---------------- ------------- --------------------------- 4/30/2015 12:11:01 na101node-1a WARNING ses.status.psWarning: DS2246 (S/N SHFHU1427000502) shelf 20 on channel 0b power warning for Power supply 1: not installed. This module is on the rear of the shelf at the bottom left.
But it didn't go any further.
Has anyone else seen this behavior? I wonder if I'm missing some settings etc ..
Cheers, Vladimir _______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
That's 7-Mode behavior, apparently things have changed in cDOT.
On Mon, May 4, 2015 at 9:06 PM, Jordan Slingerland Jordan.Slingerland@independenthealth.com wrote:
Shouldn’t this set the global status to “not normal”