hitz@netapp.com (Dave Hitz) writes:
Our motivation was really your second guess:
To make sure the admin knows something is wrong.
We couldn't come up with any way to absolutely, reliably *guarantee* that the sysadmin would get notified except to have the box turn itself off. We even joked about putting lights and sirens on the box, but the problem is, at smaller sites, people sometimes stick these things in a closet somewhere and forget about them.
What does the system do when the raid.timeout is reached? Does the LCD display anything, is there an SNMP trap, autosupport email, autosupport note, etc. ? Also, does it completely power off?
I'm wondering if we should disable the feature or possibly extend the timeout.
Thanks.
- Dan
On 10 Sep 1998, Daniel Quinlan wrote:
What does the system do when the raid.timeout is reached? Does the LCD display anything, is there an SNMP trap, autosupport email, autosupport note, etc. ? Also, does it completely power off?
I'm not sure about that, but I'll tell you what's happened around here when the RAID fills up completely :) When live filesystem space and snapshot space are full, and 'df' is reading 200%, one of our toasters just crashed. The OS just halted with an error message on the LCD. And then the other toaster shows an LCD reading of "0 ops per second". In the first case, NetApp tech support just told me to power cycle it, let it recover, and then remove some snapshots. So in the second case, I did the same. However, I was operating in a highly suboptimal situation.
1) THE RAID FILLED UP. YOU DON'T LET FILESYSTEMS FILL UP. DONT DO IT. You may as well pull the power plug. I mean, what's the point? Filesystem availability is one of the critical assumptions.
2) I had no console. I'm night/weekend shift, with nobody responding on call. With such an interface, I might not have really needed to powercycle the second situation.
This isn't really related to what Mr Quinlan was asking, but it's a note to those who are thrown into a NetApp administration situation in a place that's too big for its shorts :)
` ~ ^ ' ~ ' ` ^ ~ ~ ^ ' ~ ` ` ~ ' ^ ` ~ ~ ` " ` ~ ' ^ " ^ ` ~ ' ~ ^ " ' ~ ` ^ ~ "If I seem too inconclusive, well it's just because it's so elusive." This email has been licensed by the GPL (http://www.gnu.org/philosophy) Dan ((((now seeking a Linux/Unix sysadmin job in Silicon Valley!)))) Bethe
I'm not sure about that, but I'll tell you what's happened around here when the RAID fills up completely :) When live filesystem space and snapshot space are full, and 'df' is reading 200%, one of our toasters just crashed. The OS just halted with an error message on the LCD.
Do you remember what the error message was? Filers shouldn't just *stop* if the file system fills up; that sounds like a bug.