Hi,
So I've been testing cDOT monitoring for a while now. I'm using OCUM as my primary source of alerts, but also deploying NetApp SDK based checks in nagios.
Here is what I noticed on (almost) "live" 6 nodes cDOT cluster, autosupport / OCUM configured and tested:
1. Take a cDOT node in Multi-Path HA / ACP Full Connectivity config (I'd guess this is what most people run).
2. Take a random disk shelf configured on the node in the cluster, it was DS2246 SAS 10K RPM 1.2TB disks in my case.
3. Remove one PSU on the disk shelf.
The results:
1. It does not generate Autosupport message about the PSU missing. 2. OCUM doesn't trigger an alert.
The only sub-system that noticed the issue is 'event':
Time Node Severity Event ------------------- ---------------- ------------- --------------------------- 4/30/2015 12:11:01 na101node-1a WARNING ses.status.psWarning: DS2246 (S/N SHFHU1427000502) shelf 20 on channel 0b power warning for Power supply 1: not installed. This module is on the rear of the shelf at the bottom left.
But it didn't go any further.
Has anyone else seen this behavior? I wonder if I'm missing some settings etc ..
Cheers, Vladimir