On Fri, 7 Apr 2000, Fabio Pietrosanti wrote:
Hi, i've a Server farms of VaLinux based on kernel 2.2.7-1.23 that use a F740 NetApp . Now, NetApp's engineer tell us that linux it's not the better choice for using NetApp . We mount share with this parameter : nfs rsize=8192,wsize=8192,timeo=14,intr,soft
sometimes( every 2 or 3 week), on some server(different every time) the NetApp does not respond to server, and i see in the log : kernel: nfs: server netapp-2-1 not responding, timed out The load average remain very high until i don't reboot the machine.
Yup. Was seeing it here, virtually identical to your situation, right down to the mount parameters. Interestingly, on a farm of 9 hardware-identical linux boxen (3com 3c509 cards) under fairly high load (server farm pushing 35-50G/day), the problem was only seen on the 4 machines running 2.2 kernels (2.2.10, specifically) with apache 1.3. The 2.0.3x kernels running apache 1.2 function without error. It should be noted the the boxen running 2.2 were upgraded from 2.0.3x, and while under *that* kernel revision, they had performed flawlessly.
As an additional note, most of the time (60-70%) the box was able to be recovered by killing all httpd processes, ifconfig'ing down the ethernet interface in question, bringing it back up again, and restarting the web server software. This procedure only failed when a child process was zombified. A reboot was then required.
The 2.2 machines were replaced by a commercial unix running NFS3 over UDP and apache 1.3, and have performed flawlessly.
--noah
"information warfare is a growth industry" - David Loundy