Eyal Traitel wrote:
- Network cards and switches/routers between the Linuxes and the
filers - check for 100mbit full duplex settings - check in the filer and the linux for nfsstat and netstat errors.
Grab hold of the mii-diag tool from don beckers site and force all the boxes onto 100baseT-FD, even when the cards claim to be running full duplex we've found that forcing them to FD cleared up a load of problems.
- I don't have experience with NFS on linux, but I know that there
were a lot of nfs changes in the last kernels, so even if it worked OK before, maybe you should just consider to upgrade the linuxes - it is never bad in linux anyway... :) You'll probably get faster NFS that way, since nfs was moved to kernel level I think in 2.3 or something...
The nfs server moved into the kernel as an optional feature, however the clients haven't changed much.
From what we've seen and a few other people have spotted it to is that although the linux nfs implementation is ok linux->linux linux->anything_else performs less well. Also with a single client process linux goes reasonably fast however as soon as you get lots of parallel nfs jobs things start to slow down. We've had some pretty bad problems with machines running happily for months then the amount of nfs traffic hits a threshold and it gets stuck in a downward spiral. For us the problem was so bad we've had to move the majority of things onto local discs mirrored from central servers. Unless someone fixes the problem pretty soon our strategy is going to move away from filers everywhere to just a couple of filers at the centre of the network. Anyone else think it would be worth netapp sponsoring someone to fix up the linux nfs client to perform well under load? I'm guessing that the cost-benefit to netapp would be pretty convinving.
Chris