Re: Finding who is pounding your NetApp

8 Nov 2000

      You could have set the option to let the filer collect
nfs client stats per host:
options nfs.per_client_stats.enable on
Then you would be able to "rsh filer nfsstat -h | more"
And you can look for the offending client. Even better
would be the following:
rsh options nfs.per_client_stats.enable on
rsh filer nfsstat -z
(wait a short period of time, seconds or minutes)
rsh filer nfsstat -h | more
now you look and see which host is sending the
most requests....go to that host and look for
the offending process(es).
--tmac
John Stoffel wrote:
...
Hi all,
Just to drag this conversation back to a purely NetApp, purely NFS
scenario, I'd like to get some help and pointers on how I can solve a
problem I had this morning in a more general and useful way.
Let me give you the background details here.
We have a bunch of toasters here, various old F330s, an F520 (soon to
be retired) and some F740s.  This morning a bunch of people were
complaining that their workstations were slow, that home directories
were timing out, etc.  These people all had their home directories on
an F330 running OnTap 5.2.1.  It has 192Mb of RAM, four shelves, each
with 7 x 4gb disks.
The poor system was simply pinned to the wall by a client.  The CPU
was hovering between 85% and 100%, it was reading and writing around
2.3Mb/s to the disks constantly.  The nfsstats told me that about 23%
of the traffic was writes, the rest was attr lookups and reads.  The
usual mix of NFS traffic.  The cache age was down around 4-5 (it's
normally much higher), so I knew it was getting hit hard with writes.
But since the system was on a direct link back to a switch, and since
I don't run the network at all and don't have access to it, I couldn't
tell which system(s) were beating it up.
We ended up putting in a PC on a repeater to sniff the link between
the switch and the NetApp to try and figure out which host(s) was the
bad boy.
Once we figured that out, it still didn't help since the two hosts
didn't look loaded at all, nor were there any runaway processes
sucking up IO that I could find.  The clients were both quad processor
Suns running Solaris 2.5.1 or 2.6.
I use the following tools to try and figure out what was going on
here, and failed.  We had to reboot the two systems to solve the
problem.  Now as a Unix admin, this really pained me, since I should
have been able to find the culprits and just kill them off.  We used
these on the solaris side:
 snoop
 tcpdump
 lsof
 top
 ps (in all kinds of variations).

 ethereal (found after the fact, will be used in the future).

And on the NetApp side I used:
nfsstat
netstat -n
netstat -r
sysstat 4

And while they all showed me something, none of them could show me
what I needed.
On the NetApp side I needed something to show me the top 10 NFS hosts
but IP address, but I couldn't get it to work.  The output of 'netstat
-r' wasn't a help at all.
On the Solaris side, tcpdump showed me the traffic, but didn't give me
a way to relate it back to a specific process.  And while lsof showed
me processes, it didn't show me which one was writing data and at what
rate.
Does anyone have any hints?  I'm thinking of upgrading to 5.3.6 at
some point, just to bring the F330s upto date with the F740s, but I'm
not in a rush really.
Ideally, something on the NetApp side to show me the top NFS clients
in terms of Data Rate, or anything would be a god-send.  Then
something on the Client side to figure out which process(es) were the
NFS hogs would also be good.
Thanks,
John
   John Stoffel - Senior Unix Systems Administrator - Lucent Technologies
         stoffel@lucent.com - http://www.lucent.com - 978-952-7548
--
             ******All New Numbers!!!******
*************                              *************
Timothy A. McCarthy  --> System Engineer, Eastern Region
       Network Appliance  http://www.netapp.com
240-268-2034 Office       \  /           Page Me at:
240-268-2001 Fax           /            888-971-4468

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Finding who is pounding your NetApp