This is similar to a problem we have been experiencing with increasing frequency. We run large batch jobs that compile our S/W. During these batch jobs we get "file not found" or "make: no rule to create target" type errors where we can prove the files existed on the filler when the Suns reported them missing. We don't have a "hang" just a failure to stat().
The problem has manifested itself on both our Solaris 2.5.1 and Solaris 8 systems. The filler is a F760 (5.3.5R2P2). The load on the filler when this happens is typically 10,000 ops/sec or greater. Netapp has asked for packett traces but we are talking gigabit interfaces here, the trace files are huge and "pktt" is dropping about 80% of the packets anyway.
We have already cut "nfs.udp.xfersize" to 8K and are running out of ideas. I am getting hauled in front of management on a regular basis to explain how I am going to make the problem go away. Now all I need is a solution.....
What type of filler do you have and what version of code is it running? I know it's obvious, but have you checked for duplex problems? If you find a solution I would really like to hear about it.
Graydon Dodson (606) 232-6483 grdodson@lexmark.com Lexmark International Inc.