That messages guarantees that NFS is flushing unacknowledged NFS operations, the only question is why. If you're not having frequently power failures of your database servers (I hope that's a safe assumption!) then you're almost certainly hitting the known DNFS issue.
I strongly recommend getting to 11.2.0.4 if you're using DNFS. It's got a deadlock issue where you'll see these nfsd.tcp.close.idle warnings frequently, usually with stalls in IO that can last a couple minutes. I can't think of any risk of upgrading to 10Gb. In addition, I would recommend patching ONTAP up to 7.3.7P2 in order to get an ONTAP patch related to NFS flow control.
If all you're seeing is latency spikes, that's probably a different issue. These NFS flow control messages are usually associated with total hangs that last up to 2 minutes, although not usually that bad.
Don't let this scare you away from DNFS, though. The bugs in question existed for many years any nobody noticed until recently. They're extremely rare.
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Martin Sent: Friday, March 28, 2014 2:06 PM To: toasters@teaparty.net Subject: RE: NFS fails?
Interesting thread, I've got a similar situation with a 3140 with 7.3.6P2 connected to an Oracle host over 1GbE using NFS which showing spikes in latency on the host. The Oracle host is showing dropped packets on its storage interface and I am seeing lots of messages logged like:
Mon Mar 21 12:06:33 GMT [Filer1: nfsd.tcp.close.idle.notify:warning]: Shutting down idle connection to client (x.x.x.x) where transmit side flow control has been enabled. There are 131 outstanding replies queued on the transmit buffer. This socket is being closed from the deferred queue.
My thought was the Oracle hosts interface is saturated and its not responding to the NFS acknowledgements in time and so the Netapp is dropping the NFS requests.
The 1GbE interface is being upgraded on the Oracle host but one of my concerns is hitting bugs that have been fixed in later 8.1.x releases once we remove the bottleneck on the Oracle host. Particularly the DNFS and load related bugs. I then read your comment:
"The only time I’ve seen this issue occur, other than an actual total failure of network connectivity, is with some Oracle DNFS bugs."
Is it possible to confirm whether this is simply the Filer flushing unacknowledged NFS requests or if this is actually the DNFS bug?
-- View this message in context: http://network-appliance-toasters.10978.n7.nabble.com/NFS-fails-tp25611p2561... Sent from the Network Appliance - Toasters mailing list archive at Nabble.com.
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters