We're seeing some performance problems on an F720 which you guys might be able to help with (even though it's not exactly top of the range nowadays).
What we see is: - We're driving it over NFS v3. - We get about 1500 ops/sec out of it, of which one third are writes and two thirds are reads. There aren't many file open/close operations. - The operations are to random locations in large (10GB) files. - The CPU is running about 75%, and the network input and output below the throughput of the link we have to it. We've seen both CPU and network go higher if we do simple copy tests. - Looking it it via pktt trace, something approaching 15% of WRITE operations take long enough for the clients to time out and retransmit (so at least 1 second). None of the READ operations do. - The retransmissions appear to come in bunches. For example we'll see a few seconds where the filer doesn't respond, during which time the retransmissions will come in, then it will wake up and send some responses back. - The rest of the time, the WRITES are very fast (sub-ms).
This appears to have worsened recently. We tried a couple of things: - We thought that this might be because the disk had got full and fragmented, so we zapped a bunch of data. - We rebooted. Neither of these seemed to help much.
sysstat consistently shows a cache age of 1. This, and the bursty nature of the delays, suggest to me that I'm just hitting it too hard, and there's some kind of periodic cache-flushing operation going on, but do any of you folk have any other suggestions?
Edward Hibbert Internet Applications Group Data Connection Ltd Tel: +44 131 662 1212 Fax: +44 131 662 1345 Email: eh@dataconnection.com Web: http://www.dataconnection.com