Re: Bad Solaris performance over time - toasters

3 Jan 2001


      On Tue, 2 Jan 2001, Chris Lamb wrote:
->One box handled incoming SMTP and all local delivery (sendmail+procmail)
->and POP, the other handled some unholy number of Majordomo lists.  I mean,
->this was with hardware that nowadays people scoff at, running ancient
->software releases and NFS V2 over a single network interface.  I find it
->impossible to believe that a 4-way UltraII and an F760 can't handle the
->load.  (Yeah, the average mail message with all the crap "modern" mailers
->add on is probably several Kb larger, but that's almost never the real
->bottleneck.)
That was supposed to be part of my point, the machine we have can more the
adequetly handle the load. :)
The stats
http://static-content-01.communitytelephone.net/~hill/server_stats
show that the machine doesn't breathe hard all day long.  It seems that
something happens over time and at some point hits critical mass, which
causes the performance to become very bad.
->
->Some questions that may be a bit obvious:  Is your 2.6 box patched to
->current levels?
Yes.  Patch revisions are posted at the above URL.
->Why NFS V2?
This was a typo, it is really V3.
->How much NVRAM in the filer?
32Mb.  The filer weekly_log is at the above URL.
->Is the network bogged down, or the switch misconfigured?
We see no errors on the switch, or the servers network interfaces.
->How many network interfaces, of what type/speed?
Filer has a Gigabit interface.  Solaris servers has 100Mb.  You can see
the throughput information at the above URL file: nx.se.
->What does snoop/tcpdump/your network sniffer say?
I haven't tried this yet.  It looked like there was going to be way too
much information to sort through.  I will gather it the next time the
system begins to act up.
->How many disks in the volume that contains /var/mail?
14
->Is this filer running CIFS & NFS, or is it purely an NFS box?
NFS only
->If /var/mail is on the filer, what about home directories (i.e., is
->your mailer bogging doing ".forward" lookups)?
home directories are on the netapp as well.  Again I must state it is not
a general performance problem, but something that happens over time.
-> Are you using NIS and/or DNS on the filer?
No.
->What version of sendmail/qmail/whatever?
Latest smail.
->Are you using procmail for delivery?  Why not?  :-)
Yes.
->Is disk scrubbing kicking in? Are you taking snapshots too frequently?
->Do you have cron jobs walking the filesystem or other network clients
->hitting the filer or the network when these slowdowns are observed on
->the ->mail server? And really, why NFS V2?
I appreciate what you are getting at here, but I don't believe that sort
of thing is happening.  It you look at the stats for the filer the
performance of it stays stable through the trouble period with the one
server.  Also, all other servers continue to perform well.  I believe the
server stats show that things like these aren't occuring as well.
We are using V3 mounts I typo'd my earlier email.
->(I'm not really a smartass, I just play one on the Internet. :-)  These
->are just some of the questions I'd look to answer before poking at
->/etc/system.
We did do that.  We feel stronghly that there is a bug in Solaris or in
the interaction between it and the netapp which cause our performance
problems to occur.  We tried to massage /etc/system parameters to keep
from occuring.
Again, I must point out our server ran great when we have /var/mail on
local storage.  We only began to see this problem after moving it to the
filer and only after 24 - 30 hours of operation.
I have found a bug ID on sunsolve that described similarities to our
problem, but we already have the patch that was supposed to fix that bug.
I include it here so you can get a feel for the actual problem from
someone elses perspective.  Even though our problem is not an exact match
we do have similarities with it.  We also have similarities to a posting
on toasters
Date: Tue, 10 Aug 1999 08:05:18 -0700 (PDT)
  From: Nick Christenson npc@sendmail.com
  To: toasters@mathworks.com
  Subject: NFS client tuning, problem and resolution.
------------------------ Sun Bug ID 4034003 ---------------------------
Some time ago I got email from someone on a system w/ 37,000 users who had
problem using mail from it's mailserver, both CPU were very busy and the
system had slowed to a crawl at peak times.
The systems gets +/- 70K sendmail calls, 130K pop3 calls a day, mail lives
ona NFSv3 capable network appliance server.  Mail is delivered to the home
directories, not to /var/mail.
He ran kernel profiling and found crcmp() topping the list:
granularity: each sample hit covers 2 byte(s) for 0.12% of 8.65 seconds
%  cumulative    self              self    total
   time   seconds   seconds   calls  ms/call  ms/call name
   39.2       3.39     3.39                           crcmp [1]
    2.8       3.63     0.24                           locked_pgcopy [2]
    2.5       3.85     0.22                           splx [3]
    1.8       4.01     0.16                           bcopy [4]
    1.7       4.16     0.15                           mutex_enter [5]
    1.7       4.31     0.15                           disp_getwork [6]
    1.6       4.45     0.14                           dnlc_purge_vp [7]
    1.5       4.58     0.13                           nfs3_access [8]
    1.5       4.71     0.13                           ross625_vac_segflush
[9]
    1.3       4.82     0.11                           blkclr [10]
    1.3       4.93     0.11                           idle [11]
    1.0       5.02     0.09                           mutex_adaptive_enter
[12]
While the information is not complete, here's what I think happens,
  all users access the same rnode ("/export/home", mail is delivered
  to the home directies) a large access cache is built and here's where we
  serialize:, in nfs3_access()
  : [common/fs/nfs/nfs3_vnops.c]
........
mutex_enter(&rp->r_statelock);
          for (acp = rp->r_acc; acp != NULL; acp = acp->next) {
                  /*
                   * Look for an entry by comparing credentials.
                   */
                  if (crcmp(acp->cred, cr) == 0) {
                          if ((acp->known & acc) == acc) {
  #ifdef DEBUG
                                  nfs3_access_cache_hits++;
  #endif
                                  if ((acp->allowed & acc) == acc) {
                                          mutex_exit(&rp->r_statelock);
                                          return (0);
                                  }
                                  mutex_exit(&rp->r_statelock);
                                  return (EACCES);
                          }
                          break;
                  }
          }
          mutex_exit(&rp->r_statelock);
And further down, if we have a cache miss, we do the same all over again.
(As we unlock, we need to recheck whether a new entry was added, so the
list is rechecked, again holding the r_statelock)
Now, that algorithm works fine in the typical case where users access
their own files; or whena few users do.  It does *not* work when you have
37K users accessing a single file.  As long as the node is cached and the
kernel doesn't reap kernel memory, the list will grow out of bounds and
access to this node will get increasingly slow.
This theory was tested by switching to NFSv2 mounts.  This fixed the
problem.
Attached is a program that shows the behaviour somewhat.  It calls
"access()" on a file (specified on the command line) as many different
users.  The file must exist on a NFSv3 filesystem, the program must be run
as root and the file must be openable as root on the client.
One sample run printed this:
0 18860
  500 5680
  1000 4798
  1500 3963
  2000 3071
  2500 2425
  3000 1552
  3500 863
and this tells that after running access(file, F_OK) as 4000 different
uids, there's a really high penalty for the first users that called
access.  In this case for uid(0) searching the cache costs 18ms.
It seems that this cache should not be allowed to grow out of bounds,
currently it's ownly limited by kernel memory.
-Work Around-
Use NFSv2 mounts on filesystems that are used by lots of users such as
"/var/mail".
michael.eisler@Eng 1997-02-24
The above aork around is fine as long as the server is not a Solaris
2.5[.1] or 2.6 system. Since the ACL protocol has an ACCESS procedure for
NFS V2, a Solaris 2.5 or alter system will also use the ACL cache system.
---------------------- End Sun Bug ID 4034003 -------------------------
We did try using V2 mounts as a test, and we still experienced the
problem.
--Jamie Hill
->
->-- Chris
->
->--
->Chris Lamb, Unix Guy
->MeasureCast, Inc.
->503-241-1469 x247
->skeezics@measurecast.com
->
->