When I have run into the bug, it didn't seem to have anything to do with how busy the filer was.
It *did* seem to be related to pending output on the telnet connection. So, if you connected, ran a command, and then closed the telnet connection while the filer was still sending results, the filer would freeze.
Another scenario is when "console log" messages randomly stream from the telnet conneciton. Closing the connection while the console logs message is being written triggers it also.
It interesting that you mentioned "Big Brother", because I never hit the bug while actually typing interactively via telnet. It was always via some automated telnet connection ( Expect, Perl, etc). I guess a human typing always tends to wait before sending the <ctrl-D>, while the scripted connections aren't anticipating "extra output".
== Kerry
mcope@isisph.com 09/05/01 04:46PM >>>
Kerry,
Thanks for the info. I use Big Brother (bb4.com) to monitor the Filers. One of the things it does every 5 minutes is open and close a telnet session to test telnet. It's a pretty ugly method because it just kills the telnet session as opposed to closing it. The problems on the previous Filer only manifested when it was getting hammered. I've been using Big Brother for a number of months now and haven't had any problems keeping telnet sessions open to a Filer. However, I can imagine where this could be a problem if the Filer is really busy and then one of these ugly telnet checks occurred and pushed it over the edge.
Thanks again,
Michael
On 9/5/2001 2:16:07 PM, "Kerry Schwab" Kerry.Schwab@wnco.com wrote:
I've run into a bug with the telnetd daemon on the F820. Closing a telnet session to the filer while it has pending output sometimes causes the whole filer to "hang".
This is occuring on 6.0.1R3.
I did find some related bugs in the knowledge base, but I've resorted to not using the telnet interface for now.
== Kerry
mcope@isisph.com 09/05/01 10:46AM >>>
I've recently moved files from a F740 onto a new F820. This happened on Monday and on Tuesday morning the F820 suddenly stopped serving data. It became so unresponsive that I was unable to even get in thru the console. I got prompted for a password but it never got to a command line prompt. The Filer didn't appear to be excessively busy beforehand and the network connection (Gigabit) still appeared to be fine. Basically, the Filer appeared fine externally and the LCD panel showed that there were 0 ops a second. The only way I could get the Filer "running" again was to cycle power.
The Filer has been fine since then. There were no messages in the syslog and it didn't dump a core file (I know now how to force it to dump a core for diagnosis), so NetApp tech support hasn't been able to provide much assistance or explanation.
I've seen this same behavior twice on the the F740 in the last 2 months but that had been diagnosed as a faulty Fibre Channel controller seizing up when the workload got too high. Other than the actual data files, the only thing these two Filers have in common is Data ONTAP 6.1R1. I'm running 6.1R1 on two other Filers and have had no problems. It just seems very coincidental that I didn't experience this on the F740 until after I installed this version of ONTAP.
Anyone else seen this behavior before or noticed any irregularities with 6.1R1 (I'm not running in a cluster so the latest updates/bug fixes don't appear helpful)
Thanx,
Michael Cope UNIX/Linux Systems Administrator Isis Pharmaceuticals mcope@isisph.com
UNIX is very user friendly... it's just highly selective about who it makes friends WITH!