Re: Tracing NFS getattr() calls to a file

27 Aug 2004


      ...
...
...
...
...
"Chris" == Chris Lamb skeezics@selectmetrics.com writes:
Chris> Why, just a few weeks ago I noticed almost *exactly* those same
Chris> circumstances after an upgrade and a reboot of an F820
Chris> (6.5.1R1).  In this case, netapp-top.pl (or at least the old
Chris> version I have?) was giving utterly nonsensical results
Chris> (including a negative number of ops/sec?) so I just used
Chris> "nfsstat -r" on the filer, followed up with "snoop" to confirm
Chris> and identify the culprit hosts.
Yeah, the version of netapp-top I was running was also showing bad
results.  I ended up hacking my own limited perl script to show me the
data I wanted.  Maybe I'll update the netapp-top script to work better
someday.
Chris> It seems that the getattr() calls were on the mount point
Chris> itself, not a file beneath it, which may explain why "lsof -N"
Chris> was confused.  This was a case where we had migrated the root
Chris> volume on the filer from an FC-9 shelf to a DS14 shelf, so vol0
Chris> was an entirely new volume.  Prior to the work on the filer we
Chris> had unmounted filesystems from the servers and machines we
Chris> cared about, and expected "NFS stale file handle" errors on any
Chris> client machines that we missed and would just reboot them later
Chris> - but found that on rebooting from the new vol0, the Solaris
Chris> clients that were freaking out and looping like you described
Chris> were the ones that we hadn't touched.  Oddly enough, they
Chris> *didn't* report stale file handles as we'd expected, and things
Chris> appeared to be working(!) - except that something in the NFS
Chris> client was causing the odd traffic.
This is interesting, but not quite what I've run into.  I was having
the problem when running 5.3.7..., then when we rebooted the server
into 6.4.5 (nice smooth upgrade process btw) we didn't reboot any
clients.  And then we had the same problem again a few days later.  No
client reboots or anything.
I'm pretty sure it's the users doing something with parallel builds
but finding out the file(s) they're poking at would be the first step
in figuring out what they're doing here.
Chris> A quick and dirty "fuser -kc /troubled/mount/point" and
Chris> umount/mount cycle cleared it up.  Not at all sure if this
Chris> applies to your situation, but the symptoms you describe
Chris> exactly match what we saw.
Thank you for the followup.  I'll have to keep the fuser in mind when
I see this happening again.
John

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Tracing NFS getattr() calls to a file