Bad Solaris performance over time - toasters

2 Jan 2001


      Executive summary: Our core problem appears to be that an NFS load
consisting of many file creation/removal operations eventually cause our
Solaris 2.6 and Solaris 8 servers to perform very poorly.  The culprit
seems to be kernel R/W lock contention by the caller 'nfs_access_check'
for the lock 'nfs_acache_lock'.  What is unclear is what causes the lock
contention.  It seems to build over time eventually killing the
performance of the server.  Our current work around is to reboot the
server every morning or when the problem occurs.  See details below for
further information.
We recently moved /var/mail from local storage on a Solaris 2.6 server to
our F760 and ever since we are suffering from serious performance problems
on our Solaris server.  We have about 16,000 users and deliver about
100,000 - 150,000 message a day.  It is important to note that the we have
a few other solaris servers that communicate with our filer and they do
not experience any problems.  It really seems to be related to the high
volume of traffic generated by moving /var/mail over to the filer.  Note
that we use dot-locking only and therefore are not troubled by the
well-known problems with fcntl-based locking over NFS.
It is worth noting that we experienced a similar problem a few months ago
with a web based email package we wrote.  The email package uses form
based authentication (as opposed to browser based) and kept relevant
session data in a flat file (about 200 bytes big).  Originally we had all
18,000 files in a single directory.  We tried hashing the directory so
that we had about 300 files per directory, but it did not make a
difference.  We then moved to using a Mysql database to store the session
data, and the problem went away.  Before moving to the database
implementation we tried a two different physical machines (both with the
characteristics defined below) as well as Solaris 2.6 and Solaris 8, none
of which made a difference.
It seems very clear that the symptom is brought on by a heavy load of file
creation and removal.  We are not sure if file size is an issue, but it
could be given the size of our session files and the fact that dot-lock
files are 0 bytes.
We do have an open ticket with Network Appliance case# 176124 but so far
we haven't made much progress with it.
On to the details ...
Filer: F760 Release 5.3.6R1
Solaris 2.6 Box: Hardware is AXmp, 4 360MHz UltraSPARC-II CPUs w/ 4MB
                 cache, and 2Gig RAM
mount options: vers=2/proto=udp/rsize=32768/wsize=32768/noac/remote
Our server tends to perform well for about 24-30hrs depending on load.
At some point after that we start seeing bad performance.  We have
gathered the following statistics
From the Solaris Box
       iostat, mpstat, netstat -k hme0, netstat -k, netstat -s,
       nfsstat -m, nfsstat -cnr, vmstat, vmstat -i, vmstat -s,
       top -n -dinfinity -s60 0
From the Filer
       netstat -m, netstat -s, nfsstat -d, nfsstat -l, statit,
       sysstat
The only place I saw anything out of the ordinary from the above stats
when the system was working properly vs. having problems was from the
output of top and mpstat.  I have made all stats available at
http://static-content-01.communitytelephone.net/~hill/server_stats
In case anyone wanted to see them.
Good performance
last pid:   505;  load averages:  2.26,  2.24,  2.29    14:30:04
407 processes: 393 sleeping, 3 running, 9 zombie, 2 on cpu
CPU states: 32.1% idle, 40.0% user, 27.9% kernel,  0.0% iowait,  0.0% swap
Memory: 2048M real, 444M free, 1000M swap free
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys  wt idl
  0  516   0  505  281  159 615   23   55   17   1  1970  17  12   4  67
  1  298   0 8056  125    0 810   21   63   31   1  1974  13  21   3  62
  2  579   0 2056  313  100 717   15   61   33   0  6216  12  12   2  74
  3  991   0  832  871  721 547   31   64   40   1  4804  18  24   5  53
Bad performance
last pid:  8130;  load averages: 21.64, 18.25, 13.56    18:13:05
315 processes: 301 sleeping, 6 running, 6 zombie, 2 on cpu
CPU states:  1.4% idle, 28.3% user, 69.9% kernel,  0.4% iowait,  0.0% swap
Memory: 2048M real, 93M free, 18M swap in use, 982M swap free
CPU minf mjf xcal  intr ithr csw icsw migr smtx srw syscl usr sys  wt idl
  0  608   0 1943   703  366 898  180  131   24 309  2861  24  75   0  2
  1  512   0 3161   336    0 936  172  140   18 330  3033  26  72   0  2
  2  633   0 10496  515  100 963  168  125   34 330  2712  23  76   0  2
  3  607   0 1442  1262  953 883  164  139   31 291  5426  30  68   0  2
You can see from the mpstat output that icsw, srw, usr, and sys have
somewhat different characteristics between the two machine states.  I
found the high srw numbers particularly interesting.  I believe they are
responsible for the change in the icsw, usr, and sys numbers.  (i.e. when
there is a lot of lock contention the sys cpu utilization goes up, which
could also cause the involuntary context switches (icsw) to increase.)
I ran 'lockstat sleep 30' during a bad performance period to try and
determine the cause of the high srw numbers.  Here is what I saw
R/W writer blocked by writer: 135 events
Count indv cuml rcnt     nsec Lock               Caller
-------------------------------------------------------------------------
  103  76%  76% 1.00 271500476 nfs_acache_lock   nfs_access_cache+0x1c
   23  17%  93% 1.00 311148519 nfs_acache_lock   nfs_access_purge_rp+0x24
    3   2%  96% 1.00  7275646 0x61a87758         ufs_remove+0x11c
    2   1%  97% 1.00 17863392 0x61a87758         ufs_rename+0x2c4
    2   1%  99% 1.00 19876013 0x61a87758         ufs_create+0x1a8
    1   1%  99% 1.00    46461 tim_list_rwlock    tim_addlink+0x64
    1   1% 100% 1.00  8152217 0x61a87758         ufs_link+0x13c
-------------------------------------------------------------------------
R/W writer blocked by readers: 1356 events
Count indv cuml rcnt     nsec Lock               Caller
-------------------------------------------------------------------------
 1135  84%  84% 13.47 142070011 nfs_acache_lock  nfs_access_cache+0x1c
  218  16% 100% 10.52 100639598 nfs_acache_lock  nfs_access_purge_rp+0x24
    2   0% 100% 1.00   603387 0x61a87758         ufs_rename+0x2c4
    1   0% 100% 1.00   121137 0x61a87758         ufs_rename+0x25c
-------------------------------------------------------------------------
R/W reader blocked by writer: 1608 events
Count indv cuml rcnt     nsec Lock               Caller
----------------------------------------------------------------------
 1600 100% 100% 1.00 103571147 nfs_acache_lock   nfs_access_check+0x18
    7   0% 100% 1.00 13694146 0x61a87758         ufs_dirlook+0xc0
    1   0% 100% 1.00  5391663 0x61a86e28         ufs_dirlook+0xc0
----------------------------------------------------------------------
R/W reader blocked by write wanted: 36530 events
Count indv cuml rcnt     nsec Lock               Caller
----------------------------------------------------------------------
36530 100% 100% 17.19 55578316 nfs_acache_lock   nfs_access_check+0x18
----------------------------------------------------------------------
For grins I did the same while the machine was performing okay
R/W writer blocked by writer: 11 events
Count indv cuml rcnt     nsec Lock               Caller
----------------------------------------------------------------------
    4  36%  36% 1.00    34507 nfs_acache_lock    nfs_access_cache+0x1c
    3  27%  64% 1.00    41125 kas+0x1c           as_map+0x6c
    2  18%  82% 1.00    94435 kas+0x1c           as_unmap+0x48
    1   9%  91% 1.00   238723 0x61b7f720         write+0x84
    1   9% 100% 1.00  8324704 0x618283a0         ufs_create+0x1a8
----------------------------------------------------------------------
R/W writer blocked by readers: 37 events
Count indv cuml rcnt     nsec Lock               Caller
----------------------------------------------------------------------
   24  65%  65% 1.04   114407 nfs_acache_lock    nfs_access_cache+0x1c
   10  27%  92% 1.00    93271 kas+0x1c           as_unmap+0x48
    2   5%  97% 1.00   101398 kas+0x1c           as_map+0x6c
    1   3% 100% 1.00   171407 0x618283a0         ufs_rename+0x2c4
----------------------------------------------------------------------
R/W reader blocked by writer: 61 events
Count indv cuml rcnt     nsec Lock               Caller
----------------------------------------------------------------------
   53  87%  87% 1.00   295492 nfs_acache_lock    nfs_access_check+0x18
    4   7%  93% 1.00    78438 kas+0x1c           as_fault+0x300
    2   3%  97% 1.00  8945909 0x61a45214         ufs_getattr+0x50
    1   2%  98% 1.00 35317881 0x61a45208         ufs_dirlook+0xc0
    1   2% 100% 1.00  3329355 0x618283a0         ufs_dirlook+0xc0
----------------------------------------------------------------------
R/W reader blocked by write wanted: 3 events
Count indv cuml rcnt     nsec Lock               Caller
----------------------------------------------------------------------
    3 100% 100% 1.00   164763 nfs_acache_lock    nfs_access_check+0x18
----------------------------------------------------------------------
From the percentages it seems that nfs_access_check/nfs_acache_lock is the
dominant player in the srw lock contention.
What I don't understand, is what causes the drastic change in lock
contention over time.  I've spent a few hours on sunsolve.sun.com and
docs.sun.com looking for information on nfs_access_check/nfs_acache_lock
and have come up empty.  I have also gone through Sun Performance and
Tuning Second Edition, and the toasters mailling list with no luck.
At this point I'm not sure if there is a netapp incompatiliby with
Solaris, or if there is a bug in Solaris NFS client implementation.  If
anyone out there has any ideas I would be extrememly grateful.
btw: we have made the following changes to /etc/system with no noticeable
change in system behaviour.
* a few kernel params are related to maxusers which is by default a few
* less than amount of ram in system in MB with a default limit of 1024
* by adjusting this up certain other values are also adjusted, which is
the
* goal here.  pp. 359 Sun performance tuning, but the decision to increase
* came from the Sun System Tuning white page pp. 69
set maxusers=1536
* These settings are to help tune nfs performance to the netapp -JPH
* 12.19.00 per toaster article subject: NFS client tuning, problem and
* resolution / Date: Tue, 10 Aug 1999 / From: Nick Christenson
* npc@sendmail.com
* set ncsize=8192
set ncsize=6144
* It appears that the increase in this number has resulted in mutex
* contention viewable by mpstat (smtx) and lockstat
*set nfs:nfs_nra=10
*set nfs:nfs3_nra=10
* Tweaks from SE and Sun Performance Tuning JPH 12.25.00
* http://www.sun.com/sun-on-net/performance/priority_paging.html
set priority_paging=1
* Connection hash table size default 256 increased to make hash lookups
* more efficient
set tcp:tcp_conn_hash_size=8192
* inode cache size - can view with 'netstat -k inode_cache'
* ideally this should be 4/3 the size of ncsize
set ufs_ninode=8192
* Things tweaked by virtual_adrian from the se toolkit
* There is also precedent for this in Sun Performance Tuning and
* Solaris Internals pp. 599
set slowscan=500
set maxpgio=25468
set fastscan=64306
set handspreadpages=30960
--Jamie Hill