This is could be bug #12417. It's fixed in 5.3.5P2, 5.3.5R2P1, and 5.3.5R2P2.
Mark Muhlestein -- mmm@netapp.com
-----Original Message----- From: Brian Tao [mailto:taob@risc.org] Sent: Tuesday, May 09, 2000 10:58 PM To: toasters@mathworks.com Subject: Portscan causes DOT 5.3.2R1 to fall over?
Has anyone tried portscanning a filer (or had their filer
portscanned?) Any ill effects? We ran a network-wide nmap on our own internal network yesterday evening, and while at first it didn't seem like it harmed anything (we weren't expecting any problems), it now appears that some badness may have resulted on one of our F210's running DOT 5.3.2R1.
We started noticing the problem when NFS clients were no longer
able to establish file locks. At first, we thought this was simply Solaris' NFS client acting up again. But then we noticed it happening on a few other Solaris servers and also on someone's FreeBSD workstation.
Further digging showed that every Solaris NFS client of the F210
logged the exact same message to syslog at roughly the same time (give or take a couple of minutes):
May 8 18:13:25 office.corp /usr/lib/nfs/lockd[19392]: t_accept(file descriptor 5/transport tcp) TLI error 7
The lock_dump command on the filer showed a few locks in GRANTED
state, but one was for a pid that no longer existed on the given client. At that point, any attempt to lock a file residing on that Netapp would time out and fail. An "rpcinfo -p" showed all the usual RPC services running (nlockmgr, status, mountd, nfs, rpcbind) except for rquotad. I don't know if that is significant (we do have user and tree quotas on the filer).
The time of the syslog entries coincided with the time of the
portscan run the previous night. To test a couple of other theories, I tried a series of TCP, UDP and RPC portscans using nmap 2.30b20 ("nmap -sRUT"). The filer crashed almost immediately, but came back up within a couple of minutes. Now file locking was behaving correctly again, and a number of applications that were hanging before now worked fine.
I'm not sure if Netapp support will have much to go on here... no
crash dump was saved to disk and /etc/messages is clean, with the only odd entry being:
Tue May 9 15:39:20 EDT [de0]: Discarding packets for DHCP client port originating from server 10.35.1.73 port 9af6.
10.35.1.73 is the host doing the nmap. This entry was logged just
before the filer crashed. I didn't have a console attached to the filer at the time to watch the boot messages. The filer serves NFS as well as CIFS.
Anyone run across this problem before?
-- Brian Tao (BT300, taob@risc.org) "Though this be madness, yet there is method in't"