This community post also does a good job explaining it:
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]
On Behalf Of Parisi, Justin
Sent: Tuesday, January 23, 2018 5:28 PM
To: Steiner, Jeffrey <Jeffrey.Steiner@netapp.com>; Mark Saunders <Mark.Saunders@pcmsgroup.com>; Fenn, Michael <fennm@DEShawResearch.com>; toasters@teaparty.net
Subject: RE: NFS issue after upgrading filers to 9.2P2
The network stack changed in 9.2 and IP fastpath was removed. But fastpath was mainly for more efficient routing.
https://library.netapp.com/ecmdocs/ECMP1114171/html/GUID-8276014A-16EB-4902-9EDC-868C5292381B.html
The stack was changed to a more standard BSD stack, so fastpath was no longer needed. It’s possible that’s an issue here, but I’d suggest getting network sniffs on each endpoint of the network to see where the packet is being dropped.
From:
toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]
On Behalf Of Steiner, Jeffrey
Sent: Tuesday, January 23, 2018 5:24 PM
To: Mark Saunders <Mark.Saunders@pcmsgroup.com>; Fenn, Michael <fennm@DEShawResearch.com>;
toasters@teaparty.net
Subject: RE: NFS issue after upgrading filers to 9.2P2
I should have asked - is this SAP HANA or something like SAP on an Oracle database?
Also, what do they mean "it's not on the IMT?" Virtually everything NFS is on the IMT. We support any NFSv3 and NFSv4 client that obeys the specification. There's a tiny number of exceptions, but generally speaking we'll support linux,
Solaris, AIX, mainframe, OpenVMS, HP-UX, Oracle DNFS, AS/400, etc. There really should be no issue there.
The thing about fastpath does ring a few bells.
From: Mark Saunders [mailto:Mark.Saunders@pcmsgroup.com]
Sent: Tuesday, January 23, 2018 11:18 PM
To: Steiner, Jeffrey <Jeffrey.Steiner@netapp.com>; Fenn, Michael <fennm@DEShawResearch.com>;
toasters@teaparty.net
Subject: RE: NFS issue after upgrading filers to 9.2P2
Thanks for the quick replies sorry for the delay in e responding but I was working on this since 5am so had to go sleep.
I have a call open with netapp but have had the coockie cutter response of it isn’t on the Interoperability Matrix Tool as a supported version (It wasn’t when on 9.1 anyway)
A third party we have contact with have sent me a link to details about fastpathing being removed but I don’t think we were using it so maybe another false line to look down.
The mount options were kept fairly straight forward as
nfs nolock,_netdev,udp 0 0
and we have also tried the same as the one of the production servers which had tuned options, this is on another cluster so isn’t affected by this yet.
nfsvers=3,nolock,_netdev,rw,udp,rsize=32768,wsize=32768,timeo=600 0 0
How would I be able to tell if we are using DNFS ?
I will send you the support details tomorrow when I am back in the office.
Regards
Mark
From: Steiner, Jeffrey [mailto:Jeffrey.Steiner@netapp.com]
Sent: 23 January 2018 17:29
To: Fenn, Michael; Mark Saunders; toasters@teaparty.net
Subject: RE: NFS issue after upgrading filers to 9.2P2
It takes a lot for an ONTAP system to flat-out be unable to respond. Unless the timeout parameters are exceedingly short, you shouldn't reach that point, especially with anything capable of running ONTAP 9.2.
I'd open a support case on this one. In addition, if you want to trigger an autosupport and send me the serial numbers directly I can take a glance at a few stats to see if anything looks odd.
From: Fenn, Michael [mailto:fennm@DEShawResearch.com]
Sent: Tuesday, January 23, 2018 6:23 PM
To: Steiner, Jeffrey <Jeffrey.Steiner@netapp.com>; Mark Saunders <Mark.Saunders@pcmsgroup.com>;
toasters@teaparty.net
Subject: Re: NFS issue after upgrading filers to 9.2P2
The messages are not necessarily indicative of a network problem.
The kernel prints "nfs: server … not responding, still trying" when an operation times out (timeo deciseconds) for the configured (retrans) number of tries. Once the server responds, then it prints "nfs:
server … OK".
Networking problems are certainly one reason that an operation would time out, but not the only reason. An overloaded or down file server will cause the same effect.
Thanks,
Michael
From: <toasters-bounces@teaparty.net> on behalf of "Steiner, Jeffrey" <Jeffrey.Steiner@netapp.com>
Date: Tuesday, January 23, 2018 at 10:38 AM
To: Mark Saunders <Mark.Saunders@pcmsgroup.com>, "toasters@teaparty.net" <toasters@teaparty.net>
Subject: RE: NFS issue after upgrading filers to 9.2P2
I can't think why an ONTAP upgrade of this type would cause such a problem. If it was working before, it should be working now. If you had any kind of a locking, firewall, or general configuration problem you should have no access whatsoever.
I've seen some weird NFS bug sin SUSE, but that RHEL version should be fine.
What are the mount options used, and are you using DNFS?
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]
On Behalf Of Mark Saunders
Sent: Tuesday, January 23, 2018 4:29 PM
To: toasters@teaparty.net
Subject: NFS issue after upgrading filers to 9.2P2
Hi gents today we have upgraded our Coventry cluster from 9.1P6 to 9.2P2 and we are about 99% working we just have a strange issue with SAP database servers NFS mounts. When the server is bounced the mounts are attached
with no problems but after a few minutes a df –h starts to be very slow reporting the NFS mounted directories and if the databases are started up they hang and a df –h then also hangs. This sometimes recovers enough to then allow a df –h to work again but
the databases are a lost cause right now.
In the server messages we get lots of entries for the SVM
Jan 23 07:01:27 jwukccsbci kernel: nfs: server JWUKCSVM01 not responding, still trying
Jan 23 07:01:47 jwukccsbci last message repeated 5 times
Jan 23 07:02:07 jwukccsbci kernel: nfs: server JWUKCSVM01 OK
Jan 23 07:02:07 jwukccsbci last message repeated 5 times
Jan 23 07:02:27 jwukccsbci kernel: nfs: server JWUKCSVM01 not responding, still trying
Jan 23 07:02:47 jwukccsbci last message repeated 5 times
Jan 23 07:02:48 jwukccsbci kernel: nfs: server JWUKCSVM01 OK
Is there anything that would of changed in the upgrade to lock down NFS or changes options that we might need to change back.
The redhat servers are an old kernel version 2.6.18-371.el5 that has some bugs but this was working fine before the filer upgrade was carried out.
Regards
Mark
Data Centre Sysadmin Team
Managed Services
Phone:- 02476 694455 Ext 2567
The Sysadmin Team promoting PCMS Values ~Integrity~Respect~Commitment~ ~Continuous Improvement~
The information contained in this e-mail is intended only for the person or entity to which it is
addressed and may contain confidential and/or privileged material. If you are not the intended recipient of this e-mail, the use of this information or any disclosure, copying or distribution is prohibited and may be unlawful. If you received this in error,
please contact the sender and delete the material from any computer. The views expressed in this e-mail may not necessarily be the views of the PCMS Group plc and should not be taken as authority to carry out any instruction contained. The PCMS Group reserves
the right to monitor and examine the content of all e-mails.
The PCMS Group plc is a company registered in England and Wales with company number 1459419 whose registered office is at PCMS House, Torwood Close,
Westwood Business Park, Coventry CV4 8HX, United Kingdom. VAT No: GB 705338743