We have an F330 which normally functions as a News spool (and works very well, btw). Today I tried storing a large (1.5 GByte) file on the NetApp, and a while later (a few hours) tried to delete the same file. During the delete, I got *lots* of
Aug 1 15:37:22 snipp unix: NFS server snapp not responding still trying Aug 1 15:37:22 snipp unix: NFS server snapp ok
in the log. As soon as the delete was finished, everything was back to normal again.
The NetApp runs 3.1.5 (3.1.5d, I believe), and is connected to the NFS client machine (snipp) via a dedicated 100TX Ethernet. The NFS client machine is an Ultrasparc running Solaris 2.5.1.
Anybody seen this before?
Steinar Haug, Nethelp consulting, sthaug@nethelp.no
At 03:57 PM 8/1/97 +0200, sthaug@nethelp.no wrote:
We have an F330 which normally functions as a News spool (and works very well, btw). Today I tried storing a large (1.5 GByte) file on the NetApp, and a while later (a few hours) tried to delete the same file. During the delete, I got *lots* of
Aug 1 15:37:22 snipp unix: NFS server snapp not responding still trying Aug 1 15:37:22 snipp unix: NFS server snapp ok Anybody seen this before?
When I blow away a snapshot, it sometimes gives those messages during the delete, and then is fine afterwards.
As a side note, one of the worst problems we had with the filers was when someone created a file > 2 GB, and then tried to delete it. We ended up having to take the filer down and WACK it before we could delete that enormous file.
Amy
On Fri, Aug 01, 1997 at 03:57:11PM +0200, sthaug@nethelp.no said:
I have seen this while gzipping many _large_ log files. Our filer is running 4.0.1c on an F220 with NFS only (no CIFS or HTTP).
Aug 1 15:37:22 snipp unix: NFS server snapp not responding still trying Aug 1 15:37:22 snipp unix: NFS server snapp ok
There is a very non-trivial fix in the works for this problem. It is undergoing some massive testing right now and will appear in a future release of ONTAP. I don't have the exact release number yet.
To remove a file with no disruptions in service, we sometimes have customers use the following PERL script. It uses "truncate" which actually sends "setattr" requests via NFS instead of "remove" requests. Do not try to truncate the entire file in one execution. That will only result in the same problem you are seeing with a remove.
---cut--- #!/usr/local/bin/perl5 # # large file remove/delete # # For bug 4157. This script allows a user to "lop off" pieces of # a large file until the file is gone. Typically, it is prudent to # lop off about 50 megs at a time until the file is a manageable # size, then it can be removed. (Note, I simply grabbed # the 50 meg figure out of my hat.) #
unless(defined($ARGV[0]) && defined($ARGV[1])) { die "Usage: truncate <filename> <amount to tuncate by>\n"; }
if($ARGV[1]=~/[^0-9]/) { die "Usage: truncate <filename> <amount to tuncate by>\n"; }
die "$ARGV[0]: $!\n" unless(-f $ARGV[0]);
truncate ($ARGV[0], ( (-s $ARGV[0]) - $ARGV[1])) || die "$ARGV[0]: $!\n"; ---cut---
Michael Douglass wrote:
On Fri, Aug 01, 1997 at 03:57:11PM +0200, sthaug@nethelp.no said:
I have seen this while gzipping many _large_ log files. Our filer is running 4.0.1c on an F220 with NFS only (no CIFS or HTTP).
Aug 1 15:37:22 snipp unix: NFS server snapp not responding still trying Aug 1 15:37:22 snipp unix: NFS server snapp ok
-- Michael Douglass Texas Networking, Inc.
<de> 'hail sparc, full of rammage' <de> 'the kernel is with thee' <de> 'blessed art thou amongst processors'
On Fri, Aug 01, 1997 at 03:57:11PM +0200, sthaug@nethelp.no said:
I have seen this while gzipping many _large_ log files. Our filer is running 4.0.1c on an F220 with NFS only (no CIFS or HTTP).
Aug 1 15:37:22 snipp unix: NFS server snapp not responding still trying Aug 1 15:37:22 snipp unix: NFS server snapp ok
I see two issues here. First there was a performance problem with large deletes the fix for which has been put into a future release (sales can say more). It is in testing now.
Second. If you have snapshots on say hourly you may be creating many copies of both the .z file and the uncompressed files. gziping with snapshot may increase rather than decrease the number of allocated disk blocks.
Sean
-- Michael Douglass Texas Networking, Inc.
<de> 'hail sparc, full of rammage' <de> 'the kernel is with thee' <de> 'blessed art thou amongst processors'
We see a similar message all the time:
Aug 3 08:53:21 tofu kernel: nfs server netapp1.alt.net:/home: is alive again Aug 3 08:53:29 tofu kernel: nfs server netapp1.alt.net:/home: not responding Aug 3 08:53:29 tofu kernel: nfs server netapp1.alt.net:/home: is alive again Aug 3 08:53:30 tofu kernel: nfs server netapp1.alt.net:/home: not responding Aug 3 08:53:30 tofu kernel: nfs server netapp1.alt.net:/home: is alive again Aug 3 08:53:42 tofu kernel: nfs server netapp1.alt.net:/home: not responding Aug 3 08:53:42 tofu kernel: nfs server netapp1.alt.net:/home: is alive again Aug 3 08:54:03 tofu kernel: nfs server netapp1.alt.net:/home: not responding Aug 3 08:54:03 tofu kernel: nfs server netapp1.alt.net:/home: is alive again Aug 3 08:54:08 tofu kernel: nfs server netapp1.alt.net:/home: not responding Aug 3 08:54:09 tofu kernel: nfs server netapp1.alt.net:/home: is alive again
Note, the frequency!
This is when mostly reading/accessing files with occasional writes. This happens with NFS v2 and v3. Is there a way to determine whether this is a bug with our client OS (BSD/OS 3.0) or the NetApp?
Chris
Aug 1 15:37:22 snipp unix: NFS server snapp not responding still trying Aug 1 15:37:22 snipp unix: NFS server snapp ok
We see a similar message all the time:
Aug 3 08:53:21 tofu kernel: nfs server netapp1.alt.net:/home: is alive again Aug 3 08:53:29 tofu kernel: nfs server netapp1.alt.net:/home: not responding Aug 3 08:53:29 tofu kernel: nfs server netapp1.alt.net:/home: is alive again
...
Note, the frequency!
This is when mostly reading/accessing files with occasional writes. This happens with NFS v2 and v3. Is there a way to determine whether this is a bug with our client OS (BSD/OS 3.0) or the NetApp?
You might do a "find . -ls" to see if you have any giant files or directories. If so, the "removing giant files" problem is likely.
The frequency is sufficiently close to the spacing between "consistency points" in the WAFL filesystem to make me suspicious that this is our problem and not the client's. Ken Whittaker in customer satisfaction probably knows the secret chicken-blood ritual that one must perform to see whether you are getting super-slow responses on some of your ops.
As Sean mentioned, a fix for this is on the way. Sean was kind enough not to mention that the bug was mine, and modest enough not to mention that the fix was his. :-)
Dave
On Tue, Aug 05, 1997 at 01:03:51AM -0700, Dave Hitz said:
Happens under Solaris 2.5.1 as well as FreeBSD 2.2.2; so I will assume the toaster people are right that it is their problem. Just my $0.02.
We see a similar message all the time:
Aug 3 08:53:21 tofu kernel: nfs server netapp1.alt.net:/home: is alive again Aug 3 08:53:29 tofu kernel: nfs server netapp1.alt.net:/home: not responding Aug 3 08:53:29 tofu kernel: nfs server netapp1.alt.net:/home: is alive again
...
Note, the frequency!
This is when mostly reading/accessing files with occasional writes. This happens with NFS v2 and v3. Is there a way to determine whether this is a bug with our client OS (BSD/OS 3.0) or the NetApp?
You might do a "find . -ls" to see if you have any giant files or directories. If so, the "removing giant files" problem is likely.
The frequency is sufficiently close to the spacing between "consistency points" in the WAFL filesystem to make me suspicious that this is our problem and not the client's. Ken Whittaker in customer satisfaction probably knows the secret chicken-blood ritual that one must perform to see whether you are getting super-slow responses on some of your ops.
As Sean mentioned, a fix for this is on the way. Sean was kind enough not to mention that the bug was mine, and modest enough not to mention that the fix was his. :-)
Dave