Am 29.06.2011 03:58, schrieb David N. Blank-Edelman:
Ok, so one last followup and then I'll stop spamming this list: as far as we can tell, it seems like something internal to the netapp regarding its ssh functionality decided to gum up. The large number of stuck SSH connection from our monitoring host is most likely more a symptom than a cause (i.e. it tries to ssh to the netapp, but those connections along with all other connections just hang in process). There doesn't seem to be an issue with load on the box (though perhaps some other resource is low), I think we just have an issue with whatever inside OnTap should be handling SSH connections.
Since we do have an RLM card (and even rsh, sad but true) still working, we'll limp along until we can find an opportune moment to reboot.
I remember being in this situation before, after running in a bug with perfstat.sh. Just like you, I decided to reboot/cfo the filer during a low traffic night and leave it as it is. The good news: it recovered by itself, the bad news: it took a while.
You might want to take a look into NetApps API or SNMP to monitor your systems. I would never waste a ssh session for such things.
-Stefan