Solaris 2.5.1, stat/read/fcntl random freezes on NFS - toasters

20 Sep 2000


      This is similar to a problem we have been experiencing with increasing 
frequency.  We run large batch jobs that compile our S/W.   During these 
batch jobs we get "file not found" or "make: no rule to create target" type 
errors where we can prove the files existed on the filler when the Suns 
reported them missing.  We don't have a "hang" just a failure to stat().
The problem has manifested itself on both our Solaris 2.5.1 and Solaris 8 
systems.  The filler is a F760 (5.3.5R2P2).  The load on the filler when this 
happens is typically 10,000 ops/sec or greater.  Netapp has asked for packett 
traces but we are talking gigabit interfaces here, the trace files are huge 
and "pktt" is dropping about 80% of the packets anyway.
We have already cut "nfs.udp.xfersize" to 8K and are running out of ideas.
I am getting hauled in front of management on a regular basis to explain how 
I am going to make the problem go away.  Now all I need is a solution.....
What type of filler do you have and what version of code is it running?
I know it's obvious, but have you checked for duplex problems?  If you find 
a solution I would really like to hear about it.
Graydon Dodson          (606) 232-6483        grdodson@lexmark.com
Lexmark International Inc.