Keith, We have currently set up to trap on 'miscGlobalStatus' as it seemed a reasonable starter - The MIB description is a little woolly however : "This indicates the overall status of the appliance. The algorithm to determine the value uses both hardware status (e.g. the number of failed fans) and volume status (e.g. number of volumes that are full). This may change...etc" This has values of other(1) unknown(2) ok(3) nonCritical(4) critical(5) nonRecoverable(6) Is there a more detailed explanation of what event will trigger what status ? We are considering configuring our notification system to respond according to the value; i.e. email/phone on (4), 24hr page(5), print CV(6). On testing, when we switch off 1 power supply out of 2, we get a 'critical'(5) trap. ( is 'nonCritical' more appropriate ?) Other traps to possibly have set up as a 'starter' set : OverTemperature, FailedFanCount, FailedPowerSupplyCount, dfPerCentKBytesCapacity.
Other monitoring issues : We have configured syslog.conf to send message errors to 'messages.err', critical to 'messages.crit' etc. hoping to be able to parse these files to enable an appropriate response according to severity. However, when doing the same power supply pull, we only get a message in the 'info' file. (This is deemed of sufficient importance however to trigger autosupport email.) So it appears we have to match on the message wording itself - an incremental approach of matching on all known potential error messages - time consuming. In lieu of better classification of messages, has anybody got a list of known filer error messages, with severity level ?
How do other users monitor filers ?
Regards,
Richard Moore NortelNetworks Harlow UK