On Tue, 2 Jan 2001, Chris Lamb wrote: ->One box handled incoming SMTP and all local delivery (sendmail+procmail) ->and POP, the other handled some unholy number of Majordomo lists. I mean, ->this was with hardware that nowadays people scoff at, running ancient ->software releases and NFS V2 over a single network interface. I find it ->impossible to believe that a 4-way UltraII and an F760 can't handle the ->load. (Yeah, the average mail message with all the crap "modern" mailers ->add on is probably several Kb larger, but that's almost never the real ->bottleneck.) That was supposed to be part of my point, the machine we have can more the adequetly handle the load. :)
The stats
http://static-content-01.communitytelephone.net/~hill/server_stats
show that the machine doesn't breathe hard all day long. It seems that something happens over time and at some point hits critical mass, which causes the performance to become very bad.
-> ->Some questions that may be a bit obvious: Is your 2.6 box patched to ->current levels? Yes. Patch revisions are posted at the above URL.
->Why NFS V2? This was a typo, it is really V3.
->How much NVRAM in the filer? 32Mb. The filer weekly_log is at the above URL.
->Is the network bogged down, or the switch misconfigured? We see no errors on the switch, or the servers network interfaces.
->How many network interfaces, of what type/speed? Filer has a Gigabit interface. Solaris servers has 100Mb. You can see the throughput information at the above URL file: nx.se.
->What does snoop/tcpdump/your network sniffer say? I haven't tried this yet. It looked like there was going to be way too much information to sort through. I will gather it the next time the system begins to act up.
->How many disks in the volume that contains /var/mail? 14
->Is this filer running CIFS & NFS, or is it purely an NFS box? NFS only
->If /var/mail is on the filer, what about home directories (i.e., is ->your mailer bogging doing ".forward" lookups)? home directories are on the netapp as well. Again I must state it is not a general performance problem, but something that happens over time.
-> Are you using NIS and/or DNS on the filer? No.
->What version of sendmail/qmail/whatever? Latest smail.
->Are you using procmail for delivery? Why not? :-) Yes.
->Is disk scrubbing kicking in? Are you taking snapshots too frequently? ->Do you have cron jobs walking the filesystem or other network clients ->hitting the filer or the network when these slowdowns are observed on ->the ->mail server? And really, why NFS V2? I appreciate what you are getting at here, but I don't believe that sort of thing is happening. It you look at the stats for the filer the performance of it stays stable through the trouble period with the one server. Also, all other servers continue to perform well. I believe the server stats show that things like these aren't occuring as well.
We are using V3 mounts I typo'd my earlier email.
->(I'm not really a smartass, I just play one on the Internet. :-) These ->are just some of the questions I'd look to answer before poking at ->/etc/system. We did do that. We feel stronghly that there is a bug in Solaris or in the interaction between it and the netapp which cause our performance problems to occur. We tried to massage /etc/system parameters to keep from occuring.
Again, I must point out our server ran great when we have /var/mail on local storage. We only began to see this problem after moving it to the filer and only after 24 - 30 hours of operation.
I have found a bug ID on sunsolve that described similarities to our problem, but we already have the patch that was supposed to fix that bug. I include it here so you can get a feel for the actual problem from someone elses perspective. Even though our problem is not an exact match we do have similarities with it. We also have similarities to a posting on toasters
Date: Tue, 10 Aug 1999 08:05:18 -0700 (PDT) From: Nick Christenson npc@sendmail.com To: toasters@mathworks.com Subject: NFS client tuning, problem and resolution.
------------------------ Sun Bug ID 4034003 --------------------------- Some time ago I got email from someone on a system w/ 37,000 users who had problem using mail from it's mailserver, both CPU were very busy and the system had slowed to a crawl at peak times.
The systems gets +/- 70K sendmail calls, 130K pop3 calls a day, mail lives ona NFSv3 capable network appliance server. Mail is delivered to the home directories, not to /var/mail.
He ran kernel profiling and found crcmp() topping the list:
granularity: each sample hit covers 2 byte(s) for 0.12% of 8.65 seconds
% cumulative self self total time seconds seconds calls ms/call ms/call name 39.2 3.39 3.39 crcmp [1] 2.8 3.63 0.24 locked_pgcopy [2] 2.5 3.85 0.22 splx [3] 1.8 4.01 0.16 bcopy [4] 1.7 4.16 0.15 mutex_enter [5] 1.7 4.31 0.15 disp_getwork [6] 1.6 4.45 0.14 dnlc_purge_vp [7] 1.5 4.58 0.13 nfs3_access [8] 1.5 4.71 0.13 ross625_vac_segflush [9] 1.3 4.82 0.11 blkclr [10] 1.3 4.93 0.11 idle [11] 1.0 5.02 0.09 mutex_adaptive_enter [12]
While the information is not complete, here's what I think happens, all users access the same rnode ("/export/home", mail is delivered to the home directies) a large access cache is built and here's where we serialize:, in nfs3_access() : [common/fs/nfs/nfs3_vnops.c]
........
mutex_enter(&rp->r_statelock); for (acp = rp->r_acc; acp != NULL; acp = acp->next) { /* * Look for an entry by comparing credentials. */ if (crcmp(acp->cred, cr) == 0) { if ((acp->known & acc) == acc) { #ifdef DEBUG nfs3_access_cache_hits++; #endif if ((acp->allowed & acc) == acc) { mutex_exit(&rp->r_statelock); return (0); } mutex_exit(&rp->r_statelock); return (EACCES); } break; } } mutex_exit(&rp->r_statelock);
And further down, if we have a cache miss, we do the same all over again. (As we unlock, we need to recheck whether a new entry was added, so the list is rechecked, again holding the r_statelock)
Now, that algorithm works fine in the typical case where users access their own files; or whena few users do. It does *not* work when you have 37K users accessing a single file. As long as the node is cached and the kernel doesn't reap kernel memory, the list will grow out of bounds and access to this node will get increasingly slow.
This theory was tested by switching to NFSv2 mounts. This fixed the problem.
Attached is a program that shows the behaviour somewhat. It calls "access()" on a file (specified on the command line) as many different users. The file must exist on a NFSv3 filesystem, the program must be run as root and the file must be openable as root on the client.
One sample run printed this:
0 18860 500 5680 1000 4798 1500 3963 2000 3071 2500 2425 3000 1552 3500 863
and this tells that after running access(file, F_OK) as 4000 different uids, there's a really high penalty for the first users that called access. In this case for uid(0) searching the cache costs 18ms.
It seems that this cache should not be allowed to grow out of bounds, currently it's ownly limited by kernel memory.
-Work Around- Use NFSv2 mounts on filesystems that are used by lots of users such as "/var/mail".
michael.eisler@Eng 1997-02-24
The above aork around is fine as long as the server is not a Solaris 2.5[.1] or 2.6 system. Since the ACL protocol has an ACCESS procedure for NFS V2, a Solaris 2.5 or alter system will also use the ACL cache system. ---------------------- End Sun Bug ID 4034003 -------------------------
We did try using V2 mounts as a test, and we still experienced the problem.
--Jamie Hill
-> ->-- Chris -> ->-- ->Chris Lamb, Unix Guy ->MeasureCast, Inc. ->503-241-1469 x247 ->skeezics@measurecast.com -> ->