There's still something rotten in the state of Denmark if you're having to restrict the packet size so very tightly.
The increase in interrupt servicing and buffer-handling at either end, relative to even ethernet mtu sized packets is about 3-fold, so seeing *increased* performance as a result is indicative of something seriously wrong somewhere in there.
Out of interest, can you say what clients, what switches and what settings you're using in each? Just in case any of us have seen problems with those vendors before it might be worth enumerating exactly what client box, interface, OS and patch levels (within reason, hme/wfe, ip, udp, tcp, rpc, nfs, etc. patches) you're using. Ditto for switch, blades, firmware etc..
I welcome input.
I have four news servers, two running bCandid's Cyclone, two bCandid's Typhoon. The former does high-speed hauling of news, the latter provides NNTP to end-users. One of the two Typhoon servers I have set at 512-byte transfers.
All four servers are Sun E250s with 1GB RAM each. Each has only one internal 9GB disk. All run Solaris 2.6 loaded with all recommended patches through late December 1999. All have QFE cards.
The switch is a Cisco Catalyst 5505. More detail would take time.
The filers are F760s running ONTAP 5.3.4. I have two filers clustered, but all but one shelf reside primarily on one filer. Both have quad 10/100 cards.
The filers each have two Etherchannel trunks defined, for a total of four virtual interfaces.
The clients use two of their QFE ports for reaching the filers (note that most traffic goes to only one filer). I hand-tuned MAC addresses to avoid conflicts between the Cyclone servers and between the Typhoon servers, to put more even load on the filers' physical ports.
I verified all duplex settings long ago. All run full-duplex. All four clients run NFS v2 on UDP.
What more would help your troubleshooting?