Using ontap 6.4.5 with 1 ds14 over copper fc
Last night one of our F840's rebooted itself twice via "watchdog reset".
Either before or after the second reboot notification mail was sent the filer froze up with the front LCD panel stuck at some NFS ops/sec number. After a hard reboot it came up fine. I disabled the watchdog timer and the machine stayed up.
Has anyone experienced multiple watchdog timer resets or knows what type of hardware failure they watch? Also, how bad is it to keep the watchdog turned off?
Thanks, Tavis
I've had a failure along those lines. The machine would not reboot. I would see a "watchdog reset" error. The LCD was also out. I disconnected the LCD and the machine booted fine. A few weeks later the machine crashed with a "watchdog reset" and I replaced the LCD. The machine ran fine, for a while. By this time the machine wasn't in production anymore, we have moved on to a bigger machine. The problem occured with the LCD going out again. This time we could afford to keep the machine down for a while and we replaced the motherboard. That fixed the problem it seemed. We never had a "watchdog reset" again.
-Michael Cerda
I found an instances of an ECC DIMM error in the logs just after reboot :
Wed Jun 1 05:17:02 GMT [cecc_log.entry:warning]: 1 Correctable ECC error on DIMM J41 at bit D55
On Wed, Jun 01, 2005 at 11:23:35AM -0700, Tavis Gustafson wrote:
Using ontap 6.4.5 with 1 ds14 over copper fc
Last night one of our F840's rebooted itself twice via "watchdog reset".
Either before or after the second reboot notification mail was sent the filer froze up with the front LCD panel stuck at some NFS ops/sec number. After a hard reboot it came up fine. I disabled the watchdog timer and the machine stayed up.
Has anyone experienced multiple watchdog timer resets or knows what type of hardware failure they watch? Also, how bad is it to keep the watchdog turned off?
Thanks, Tavis