I have a question on scheduling which you folks might have had some experience of. It's more of an appeal for help, really.
We have a distributed POP/IMAP server which stores its data on a central NFS server. The app uses many (upto 100) pthreads, each of which might be issuing a disk op. Typically the pthreads are bound to an OS thread/LWP. We have acquired a non-exclusive file lock on the files to disable client-side caching.
Under heavy load, what we see is that some individual write/read calls take an age to complete (upto several minutes). During this time period many other read/write calls are happening happily enough and fast from other threads. When the load lets up, the long ops complete. This behaviour becomes increasingly common as we increase the number of threads in the application - with 1 thread it doesn't happen, with 10 it's quite common, with 50 it's unbearable.
We see this behaviour: - With both NetApp and Solaris as the NFS server. It takes more stress to kick in with NetApp, probably because it's faster. - With various OSs as the NFS client (Solaris, HP, Linux).
It's almost as though there is last in first out thread scheduling going on somewhere, but although it feels like a client-side issue we can't tie it down to any one OS. We have some evidence that the behaviour may be better with NFS v2 rather than v3.
When this is ocurring, it's not that the total throughput in disk read/write ops is poor - it's a little less than the peak we see - but that the scheduling is unfair. This is a problem for us because some of the disk ops are associated with a locked global resource, so if that disk op is delayed a lot, we get heavy contention on the resource, response times to individual user requests suffer, and so on.
Have any of you seen anything like this?
Edward Hibbert DCL.