I have a question on scheduling which you folks might have had some
experience of. It's more of an appeal for help, really.
We have a distributed POP/IMAP server which stores its data on a central NFS
server. The app uses many (upto 100) pthreads, each of which might be
issuing a disk op. Typically the pthreads are bound to an OS thread/LWP.
We have acquired a non-exclusive file lock on the files to disable
client-side caching.
Under heavy load, what we see is that some individual write/read calls take
an age to complete (upto several minutes). During this time period many
other read/write calls are happening happily enough and fast from other
threads. When the load lets up, the long ops complete. This behaviour
becomes increasingly common as we increase the number of threads in the
application - with 1 thread it doesn't happen, with 10 it's quite common,
with 50 it's unbearable.
We see this behaviour:
- With both NetApp and Solaris as the NFS server. It takes more stress to
kick in with NetApp, probably because it's faster.
- With various OSs as the NFS client (Solaris, HP, Linux).
It's almost as though there is last in first out thread scheduling going on
somewhere, but although it feels like a client-side issue we can't tie it
down to any one OS. We have some evidence that the behaviour may be better
with NFS v2 rather than v3.
When this is ocurring, it's not that the total throughput in disk read/write
ops is poor - it's a little less than the peak we see - but that the
scheduling is unfair. This is a problem for us because some of the disk ops
are associated with a locked global resource, so if that disk op is delayed
a lot, we get heavy contention on the resource, response times to individual
user requests suffer, and so on.
Have any of you seen anything like this?
Edward Hibbert
DCL.