Did your number of files dramatically increase? Is there a reconstruction going on? A wafl scan, or an upgrade? Ok that last one might be silly, but you never know.. :)
Can you capture a perfstat or at least a few 30 second samples of statit? Perfstat is usually too much, but here's a quick shell script I run to quickly grab what I need.
#!/bin/sh # if [ -z $1 ]; then echo " " echo "I need a filer target" echo "An example syntax" echo " get-stats.sh filer01.msg.dcn" echo " " exit 0 fi
FILER=$1 # while true do DATAFILE="$FILER`date | awk '{print "_data_" $2 $3 }'`" echo "" >> $DATAFILE date >> $DATAFILE echo "------------------------------" >> $DATAFILE rsh $FILER 'priv set -q diag; statit -b' 2>/dev/null echo "Starting statit sample" >> $DATAFILE rsh $FILER 'priv set -q diag; nfsstat -z' 2>/dev/null echo "Zeroing nfsstat" >> $DATAFILE rsh $FILER 'priv set -q diag; nfs_hist -z' 2>/dev/null echo "Zeroing nfs_hist" >> $DATAFILE rsh $FILER 'priv set -q diag; wafl_susp -z' 2>/dev/null echo "Zeroing wafl_susp" >> $DATAFILE rsh $FILER 'sysstat -xs -c 30 1' >> $DATAFILE
# And we wait...
rsh $FILER 'priv set -q diag; statit -en' >> $DATAFILE 2>/dev/null rsh $FILER 'priv set -q diag; nfsstat -d' >> $DATAFILE rsh $FILER 'priv set -q diag; nfs_hist' >> $DATAFILE rsh $FILER 'priv set -q diag; wafl_susp -w' >> $DATAFILE
echo " ** " >> $DATAFILE done
if you don't allow rsh, you can enable passphrase ssh and replace rsh with ssh (I think it's built into 7 for free now..) or just run the commands above in sequence and save them to a text. A few samples of each for about 30 seconds should do it, but only during the problem, not helpful if it's not happening at the moment.
I usually run the script for about 5 to 10 minutes, then ctrl+c out of it.
Right now I look at it a lot by hand.
-Blake
On 2/14/07, Paul Letta letta@jlab.org wrote:
I have been having some performance issues on my F880 the past couple of days. Specifically, an NDMP backup that normally takes 2 hours to complete was stuck in the mapping phase for 12 hours before I killed it.
What I am seeing is a high disk read, but without a corresponding network out. Here is a sysstat:
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 62% 1456 6 0 794 16954 46326 0 0 0 4s 56% 1418 0 0 597 13689 36506 0 0 0 5s 64% 1737 3 0 867 17637 43572 0 0 0 5s 53% 1135 18 0 572 14823 42905 1155 0 0 4s 73% 2067 20 0 1282 37835 69880 6347 0 0 4s 69% 2031 24 0 1267 38868 67632 0 0 0 4s 74% 1921 3 0 1027 26644 50881 7778 0 0 5s 59% 1459 0 0 774 18843 41313 0 0 0 6s 60% 1212 2 0 665 16391 38766 0 0 0 7s 54% 1028 19 0 509 11873 33960 0 0 0 7s 61% 844 302 0 571 13765 39032 0 0 0 7s 72% 1164 598 0 1870 16750 39968 0 0 0 5s 71% 948 640 0 8961 9788 29552 0 0 0 5s 79% 1297 603 0 24157 5831 26034 0 0 0 4s 87% 2130 465 0 23581 9794 31269 0 0 0 2s
I don't know why the disks are reading so much without it going out the network.
This is an F880 running OnTap 7.1. It is somewhat full -- about 92% (1.2TB out of 1.3TB).
There isn't any stuck ndmp sessions:
ndmpd status
ndmpd ON. No ndmpd sessions active.
I haven't really looked at what clients may be doing what due to the low network out numbers- but I am going there next.
Does anyone have any suggestions on what to look for ?
Thanks,
Paul