Re: Apology and Request (was Re: raid failure) - toasters

5 May 2000


      Cool, that's great to hear. Thanks for the response.
Respectfully,
     David N. Blank-Edelman
       Director of Technology
       College of Computer Science
       Northeastern University
John Denholm johnd@ThePLAnet.net writes:
...
Yeah, that's better now.  We used to get that a lot - part of the
problem can be the interaction of Cisco switches and Netapps - the
Cisco can take a short while to figure out that the link is back up
to a Netapp and the mail falls on the floor while waiting for that.
Some time ago, presumably at the insistence of ourselves and others,
Netapp inserted a 30 second wait and retry loop into bootup, and now
I think it gives autosupport 4 or 5 tries over a couple of minutes.
We always get autosupport off them now.  Not that, I admit, we've
had a crash message in a while, but we never used to get them on
reboots either :<
...
As for dealing with network or DNS or whatever not coming back, one thing you
can do is specify your mailhost wholely by IP - while fractionally more work to
maintain if the boxes on your network change frequently, it requires a little
less to go right.
If you are worried about missing a crash, use snmp monitoring and grab
system.sysUptime.0, which is the uptime in hundredths of seconds.  I run a
large number of netapps of different breeds, and I just run a perl script which
grabs the uptime off every one every 10 minutes.  Any crashes are immediately
apparent.  It could also write to file and compare current uptime to previous
uptime - if uptime drops, it can sms you, mail you, sound sires, flash lights,
whatever :>
Now I just wish they'd get it right with the caches.  They still only try
mailing autosupport once  :p
J

#  John Denholm                                      johnd@theplanet.net  #
#  Webcache & Filer Administrator, Planet Online        +44 113 207 6357  #
Error 404:   There is no spoon