This is similar to a problem we have been experiencing with increasing
frequency. We run large batch jobs that compile our S/W. During these
batch jobs we get "file not found" or "make: no rule to create target" type
errors where we can prove the files existed on the filler when the Suns
reported them missing. We don't have a "hang" just a failure to stat().
The problem has manifested itself on both our Solaris 2.5.1 and Solaris 8
systems. The filler is a F760 (5.3.5R2P2). The load on the filler when this
happens is typically 10,000 ops/sec or greater. Netapp has asked for packett
traces but we are talking gigabit interfaces here, the trace files are huge
and "pktt" is dropping about 80% of the packets anyway.
We have already cut "nfs.udp.xfersize" to 8K and are running out of ideas.
I am getting hauled in front of management on a regular basis to explain how
I am going to make the problem go away. Now all I need is a solution.....
What type of filler do you have and what version of code is it running?
I know it's obvious, but have you checked for duplex problems? If you find
a solution I would really like to hear about it.
Graydon Dodson (606) 232-6483 grdodson(a)lexmark.com
Lexmark International Inc.