I noticed a very serious problem this morning on one of our news servers with a Netapp-based spool (Ultra 170, 1GB RAM, Solaris 2.5.1 + Aug 16 rec. patches). That server was recently upgraded from a 2.5 Ultra running INN 1.4unoff4 to the 2.5.1 Ultra running INN 1.5.1.
Building or adding to the overviews files takes a *very* long time. I trussed the overchan as well as an expireover process, and I'm seeing a long delay (on the order of a minute or more) after one of the open() calls. I believe it is sleeping on fcntl():
[reading in article headers...] open("/news/spool/a/a/mis/talk/1944", O_RDONLY) = 4 read(4, " P a t h : t o r - n n".., 8192) = 1293 read(4, 0x0004D060, 8192) = 0 fstat(4, 0xEFFFF5A8) = 0 close(4) = 0 open("a/a/mis/talk/.LCK.overview", O_WRONLY|O_CREAT|O_TRUNC, 0664) = 4 open("a/a/mis/talk/.overview", O_RDWR|O_CREAT, 0664) = 5 [big delay here] fcntl(5, F_SETLK, 0xEFFFF5AC) = 0 fstat(5, 0xEFFFFAA0) = 0 writev(4, 0xEFFFFB28, 1) = 7404 rename("a/a/mis/talk/.LCK.overview", "a/a/mis/talk/.overview") = 0 close(4) = 0 close(5) = 0 open("/news/spool/a/bsu/programming", O_RDONLY|O_NDELAY) = 4 fcntl(4, F_SETFD, 0x00000001) = 0 [continue with next newsgroup...]
It spends 99% of the time waiting for that fcntl() to return. The spool is mounted NFSv2, UDP. I've tried both hard and soft mounts. The same NFS configuration (AFAIK) worked fine on the old news server. I still have the old server online, and I can verify this (rm the overview file, then regenerate from scratch):
old-server% time expireover -a -f /tmp/test.active 0.03u 0.11s 0:00.14 100.0%
new-server% time expireover -a -f /tmp/test.active 0.04u 0.13s 1:17.25 0.2%
Over a minute to create a 44-line .overview file? lockd and statd are running on the Solaris side, rpcinfo reports nlockmgr is registered. I must be missing something obvious, but I can't see it. :(
On Tue, 19 Aug 1997, Brian Tao wrote:
It spends 99% of the time waiting for that fcntl() to return.
The spool is mounted NFSv2, UDP. I've tried both hard and soft mounts. The same NFS configuration (AFAIK) worked fine on the old news server.
Well, rebooting the Ultra cleared up the problem. Gee. :-/