I noticed a very serious problem this morning on one of our news
servers with a Netapp-based spool (Ultra 170, 1GB RAM, Solaris 2.5.1 +
Aug 16 rec. patches). That server was recently upgraded from a 2.5
Ultra running INN 1.4unoff4 to the 2.5.1 Ultra running INN 1.5.1.
Building or adding to the overviews files takes a *very* long time.
I trussed the overchan as well as an expireover process, and I'm
seeing a long delay (on the order of a minute or more) after one of
the open() calls. I believe it is sleeping on fcntl():
[reading in article headers...]
open("/news/spool/a/a/mis/talk/1944", O_RDONLY) = 4
read(4, " P a t h : t o r - n n".., 8192) = 1293
read(4, 0x0004D060, 8192) = 0
fstat(4, 0xEFFFF5A8) = 0
close(4) = 0
open("a/a/mis/talk/.LCK.overview", O_WRONLY|O_CREAT|O_TRUNC, 0664) = 4
open("a/a/mis/talk/.overview", O_RDWR|O_CREAT, 0664) = 5
[big delay here]
fcntl(5, F_SETLK, 0xEFFFF5AC) = 0
fstat(5, 0xEFFFFAA0) = 0
writev(4, 0xEFFFFB28, 1) = 7404
rename("a/a/mis/talk/.LCK.overview", "a/a/mis/talk/.overview") = 0
close(4) = 0
close(5) = 0
open("/news/spool/a/bsu/programming", O_RDONLY|O_NDELAY) = 4
fcntl(4, F_SETFD, 0x00000001) = 0
[continue with next newsgroup...]
It spends 99% of the time waiting for that fcntl() to return. The
spool is mounted NFSv2, UDP. I've tried both hard and soft mounts.
The same NFS configuration (AFAIK) worked fine on the old news server.
I still have the old server online, and I can verify this (rm the
overview file, then regenerate from scratch):
old-server% time expireover -a -f /tmp/test.active
0.03u 0.11s 0:00.14 100.0%
new-server% time expireover -a -f /tmp/test.active
0.04u 0.13s 1:17.25 0.2%
Over a minute to create a 44-line .overview file? lockd and statd
are running on the Solaris side, rpcinfo reports nlockmgr is
registered. I must be missing something obvious, but I can't see it. :(
--
Brian Tao (BT300, taob(a)netcom.ca)
"Though this be madness, yet there is method in't"