Hi- We have an interesting situation with one of our filers. Earlier today, one of our monitoring systems managed to eat up all of the available ssh slots on the box. Once we got that under control, we have found we still can't get on to the box via ssh or ssh commands to it, they just hang. There is something listening and accepting connections, but it hangs once the connection is made.
We've tried: - turning off and on the ssh options - changing the ssh port (and changing it back) - secureadmin setup ssh (and moving the keys back)
Can you think of anything else I could do to kick the daemon short of rebooting the box (which is currently serving in production)? Perhaps something in a super-squirrel-secret priv mode?
Thanks!
-- dNb
Ok, so one last followup and then I'll stop spamming this list: as far as we can tell, it seems like something internal to the netapp regarding its ssh functionality decided to gum up. The large number of stuck SSH connection from our monitoring host is most likely more a symptom than a cause (i.e. it tries to ssh to the netapp, but those connections along with all other connections just hang in process). There doesn't seem to be an issue with load on the box (though perhaps some other resource is low), I think we just have an issue with whatever inside OnTap should be handling SSH connections.
Since we do have an RLM card (and even rsh, sad but true) still working, we'll limp along until we can find an opportune moment to reboot.
Thanks for everyone's answers.
-- dNb
Oh, I missed the bit about eating up all the sessions - is ssh running as the sshd user? If so, you might want to see how many processes/files are open for that users:
http://tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/x4733.h...
On Tue, Jun 28, 2011 at 6:58 PM, David N. Blank-Edelman dnb@ccs.neu.eduwrote:
Ok, so one last followup and then I'll stop spamming this list: as far as we can tell, it seems like something internal to the netapp regarding its ssh functionality decided to gum up. The large number of stuck SSH connection from our monitoring host is most likely more a symptom than a cause (i.e. it tries to ssh to the netapp, but those connections along with all other connections just hang in process). There doesn't seem to be an issue with load on the box (though perhaps some other resource is low), I think we just have an issue with whatever inside OnTap should be handling SSH connections.
Since we do have an RLM card (and even rsh, sad but true) still working, we'll limp along until we can find an opportune moment to reboot.
Thanks for everyone's answers.
-- dNb
Am 29.06.2011 03:58, schrieb David N. Blank-Edelman:
Ok, so one last followup and then I'll stop spamming this list: as far as we can tell, it seems like something internal to the netapp regarding its ssh functionality decided to gum up. The large number of stuck SSH connection from our monitoring host is most likely more a symptom than a cause (i.e. it tries to ssh to the netapp, but those connections along with all other connections just hang in process). There doesn't seem to be an issue with load on the box (though perhaps some other resource is low), I think we just have an issue with whatever inside OnTap should be handling SSH connections.
Since we do have an RLM card (and even rsh, sad but true) still working, we'll limp along until we can find an opportune moment to reboot.
I remember being in this situation before, after running in a bug with perfstat.sh. Just like you, I decided to reboot/cfo the filer during a low traffic night and leave it as it is. The good news: it recovered by itself, the bad news: it took a while.
You might want to take a look into NetApps API or SNMP to monitor your systems. I would never waste a ssh session for such things.
-Stefan
Hi dNb -
What is the version of ONTAP? Please show the output of "options ssh", "options telnet", and "options rsh". Please show the output of "rshstat". "Do a "priv set advanced" first.
Regards,
- Rick -
-----Original Message----- From: David N. Blank-Edelman [mailto:dnb@ccs.neu.edu] Sent: Tuesday, June 28, 2011 14:31 To: toasters@mathworks.com Subject: kicking an unhappy ssh daemon
Hi- We have an interesting situation with one of our filers. Earlier today, one of our monitoring systems managed to eat up all of the available ssh slots on the box. Once we got that under control, we
have
found we still can't get on to the box via ssh or ssh commands to it, they just hang. There is something listening and accepting
connections,
but it hangs once the connection is made.
We've tried:
- turning off and on the ssh options
- changing the ssh port (and changing it back)
- secureadmin setup ssh (and moving the keys back)
Can you think of anything else I could do to kick the daemon short of rebooting the box (which is currently serving in production)? Perhaps something in a super-squirrel-secret priv mode?
Thanks!
-- dNb