"Chris" == Chris Lamb skeezics@selectmetrics.com writes:
Chris> Why, just a few weeks ago I noticed almost *exactly* those same Chris> circumstances after an upgrade and a reboot of an F820 Chris> (6.5.1R1). In this case, netapp-top.pl (or at least the old Chris> version I have?) was giving utterly nonsensical results Chris> (including a negative number of ops/sec?) so I just used Chris> "nfsstat -r" on the filer, followed up with "snoop" to confirm Chris> and identify the culprit hosts.
Yeah, the version of netapp-top I was running was also showing bad results. I ended up hacking my own limited perl script to show me the data I wanted. Maybe I'll update the netapp-top script to work better someday.
Chris> It seems that the getattr() calls were on the mount point Chris> itself, not a file beneath it, which may explain why "lsof -N" Chris> was confused. This was a case where we had migrated the root Chris> volume on the filer from an FC-9 shelf to a DS14 shelf, so vol0 Chris> was an entirely new volume. Prior to the work on the filer we Chris> had unmounted filesystems from the servers and machines we Chris> cared about, and expected "NFS stale file handle" errors on any Chris> client machines that we missed and would just reboot them later Chris> - but found that on rebooting from the new vol0, the Solaris Chris> clients that were freaking out and looping like you described Chris> were the ones that we hadn't touched. Oddly enough, they Chris> *didn't* report stale file handles as we'd expected, and things Chris> appeared to be working(!) - except that something in the NFS Chris> client was causing the odd traffic.
This is interesting, but not quite what I've run into. I was having the problem when running 5.3.7..., then when we rebooted the server into 6.4.5 (nice smooth upgrade process btw) we didn't reboot any clients. And then we had the same problem again a few days later. No client reboots or anything.
I'm pretty sure it's the users doing something with parallel builds but finding out the file(s) they're poking at would be the first step in figuring out what they're doing here.
Chris> A quick and dirty "fuser -kc /troubled/mount/point" and Chris> umount/mount cycle cleared it up. Not at all sure if this Chris> applies to your situation, but the symptoms you describe Chris> exactly match what we saw.
Thank you for the followup. I'll have to keep the fuser in mind when I see this happening again.
John