From owner-toasters@mathworks.com Tue Mar 26 12:14 MST 2002
Hello Toasters.
I've been using a script to monitor disk failures on our filers. It's worked pretty well in the past, but I've hit a snag today...
Normally I can tell if a disk has failed in one of two ways:
- Check the SNMP "disks.failed" value, if !eq 1, we've got a problem.
- Check via SNMP, the total number of disks, then check the number of active disks, and add to it the number of spare disks. If total disks !eq to active plus spare disk, then there is a failure.
Have you thought about using syslog and triggering on an event?
As Jay's Email implied, we forward syslog (and auditlog) events to our loghosts and do misc. processing on them using Swatch, with auto-notification of "interesting" events. I don't know if the specific issue you had triggered a syslog event, but you might check your logs to see if something was there.
Just a thought, alek