Tonight my filer stopped serving NFS over its FDDI interface. Twice.
I called NAC support, got a call back, they had me take the interface down and back up. That fixed it temporarily. It was then reccomended that I disable the tcp extension (of course, it involves some work on the client side as well; joy).
I did it and now it seems to be happier.
My beef with NAC is that the tech guy said that there had been many problems like this, alot of open tickets in the support queue and that the next release of the OS would have the tcp option disabled (as the default; not removed).
Why in the world has NAC not sent out an advisory about this? If it has been causing so many problems, the least they could have done is let us know that it could happen.
Since this is a new filer (f540, 256MB read cache, 8MB nvram, 100GB, 2x100bT, 1 fddi), the problem only affected some setup stuff I was doing. The problem seems worse with fddi than with ethernet connections (my mounts that occured over ethernet were fine while the fddi ones were not).
Anybody from the peanut gallery?
Alexei
My beef with NAC is that the tech guy said that there had been many problems like this, alot of open tickets in the support queue and that the next release of the OS would have the tcp option disabled (as the default; not removed).
Why in the world has NAC not sent out an advisory about this? If it has been causing so many problems, the least they could have done is let us know that it could happen.
Since this is a new filer (f540, 256MB read cache, 8MB nvram, 100GB, 2x100bT, 1 fddi), the problem only affected some setup stuff I was doing. The problem seems worse with fddi than with ethernet connections (my mounts that occured over ethernet were fine while the fddi ones were not).
Anybody from the peanut gallery?
It is interesting to note that many other vendors have problems with their NFS/TCP implementations as well... generally in the area of performance, but stability as well.
Also note that your subject line is somewhat misleading... NFS version 3 and NFS over TCP are completely orthogonal, and one can run NFS v2 over TCP or NFS v3 over UDP. The only thing they have in common is that most vendors started supporting both at roughly the same time.
I'm not involved with the support policies but obviously the scope of the problem plays into any decision to make an announcement. If there is a critical bug that surfaces, but it's unlikely that many customers will encounter it, should an announcement go out? I have mixed feelings about the answer, and in any case my opinion doesn't matter... yours does. So, as customers, it is important that you provide feedback to Network Appliance on this very point, so Customer Service can service you better.
Bugs On-Line (a benefit of NOW) is a good source of data on bugs that are not announced through field alerts. (BTW, I see bug #3800 listed there, which mentions the ethernet interface could also hang with NFS/TCP traffic.)
Bruce
PS - I know many of you are fond of the name, but it would also be politically correct to point out that we are no longer NAC! We changed our name to Network Appliance, Inc. about the time we went public. Although if you have rows of filers all named nac1, nac2, nac3, etc., changing them all would probably be a pain. :)
+--- In our lifetime, sterling@netapp.com (Bruce Sterling Woodcock) wrote: | | It is interesting to note that many other vendors have problems with | their NFS/TCP implementations as well... generally in the area of | performance, but stability as well.
Agreed. SGI to Sun connectivity is a good example of this. I had failed to mention that this was a homogenous Sun environment.
| Also note that your subject line is somewhat misleading... NFS version | 3 and NFS over TCP are completely orthogonal, and one can run NFS v2
That they are orthogonal is a technicality in this case. I have only seen this problem with filers running V3 and TCP at the same time. But then again, I have not tried V2 and TCP. I will try to avoid that.. :)
| So, as customers, it is important that you provide feedback to Network | Appliance on this very point, so Customer Service can service you better.
No problem. I am one of NetApps biggest fans and continue to be. I just don't want this to become a trend.
| PS - I know many of you are fond of the name, but it would also be | politically correct to point out that we are no longer NAC! We changed
Hehe. Sorry about that. A bad habbit I picked up from some Netcom folks.
Thanks for the quick response.
Alexei
Alexei, I'm the Tech Support and Escalations Manager at Network Appliance. Thought I'd try to address some of your issues here.
First, I can understand your frustration at having hit this bug.
This is one of the highest priority bugs in Engineering's queue right now. But the problem (like any problem with memory buffer exhaustion) is hard to track down. Since there is no immediate or obvious fix available, we are turning off TCP by default. We are indeed aware of the issues with doing this - clients that were using TCP mounts have to be rebooted - and plan on warning customers of this in documentation. Hopefully we will have a fix for the real problem soon.
Regarding the notification to customers, we have a system through which we tell our customer base of problems like this (and release announcements and such). These are called Field Alerts. We have a list of customer email addresses that these go out to. Field Alert #23 sent out on April 14th warns customers about this bug. There is also our NOW web site at http://now.netapp.com that has bugs on line which you might want to check up.
I just checked and see that you were not on the list. I added you just now. If any one else on the toasters distribution would like to be on the Field Alert distribution please email me and I will be happy to add you.
- Diptish Datta Tech Support and Escalations Manager Network Appliance
alexei@cimedia.com wrote:
Tonight my filer stopped serving NFS over its FDDI interface. Twice.
I called NAC support, got a call back, they had me take the interface down and back up. That fixed it temporarily. It was then reccomended that I disable the tcp extension (of course, it involves some work on the client side as well; joy).
I did it and now it seems to be happier.
My beef with NAC is that the tech guy said that there had been many problems like this, alot of open tickets in the support queue and that the next release of the OS would have the tcp option disabled (as the default; not removed).
Why in the world has NAC not sent out an advisory about this? If it has been causing so many problems, the least they could have done is let us know that it could happen.
Since this is a new filer (f540, 256MB read cache, 8MB nvram, 100GB, 2x100bT, 1 fddi), the problem only affected some setup stuff I was doing. The problem seems worse with fddi than with ethernet connections (my mounts that occured over ethernet were fine while the fddi ones were not).
Anybody from the peanut gallery?
Alexei