I'd put money on the client-side options in /etc/system (or ndd equivs in scripts) not being raised high enough. We've discussed these Solaris tunables to death before now. A good document on them can be found here:
http://www.rvs.uni-hannover.de/people/voeckler/tune/EN/tune.html
I also have a question stemming from this discussion; first, some background:
Oracle, and other db's I presume, keep a lot of data in memory. They arrange to have parts of tables in memory according to their use patterns, if at all possible. Solaris, as with most modern Unixen, has a filesystem buffer-cache that keeps small writes (and probably also large writes) and reads in memory that's not part of the user-process space, and isn't AFAIK shared with the user-process space.
It's possible in Sol2.6-7 to use priority-paging to alleviate the buffer- cache competition for process memory. Solaris 8 has a different memory model. Either way the problem doesn't go away, it just bites less hard and fast.
Now (I'm getting there!) VxFS, UFS (since Sol2.6), and raw-disk devices (or Disksuite metadevices or VxVM volumes) all allow for DirectIO or raw IO. The goal being *not* to double-cache database files.
The DBMS is better at cacheing database data and more system RAM can be used constructively by the DBMS if there's less buffer-cache competing. It also saves lots of double-copies between the buffer-cache and user process space. As an aside VxFS allows 'discovered' DirectIO which means Oracle's small writes can still cause a problem.
What I want to know, and what I have yet to have clarified for me, is:
1) are NFS files buffer-cached? (I believe they are) and,
2) is there anything that can be done to moderate the buffer-cache competition (other than priority-paging), and/or alleviate the double-copying that this implies.
Also, given that on Solaris I'm informed that nothing short of an obscure fcntl() call can ensure data is flushed from a file back to the server, just how do databases maintain their ACID transaction semantics? Do you need log sections on local disk regardless?
I feel I should know this stuff, being an old hand now on the list, but it occurred to me that I still don't have all the answers. Sorry if this was a little incoherent, not entirely sober and alert right now, I might have missed something obvious.
I'd put money on the client-side options in /etc/system (or ndd equivs in scripts) not being raised high enough. We've discussed these Solaris tunables to death before now. A good document on them can be found here:
http://www.rvs.uni-hannover.de/people/voeckler/tune/EN/tune.html
I also have a question stemming from this discussion; first, some background:
Oracle, and other db's I presume, keep a lot of data in memory. They arrange to have parts of tables in memory according to their use patterns, if at all possible. Solaris, as with most modern Unixen, has a filesystem buffer-cache that keeps small writes (and probably also large writes) and reads in memory that's not part of the user-process space, and isn't AFAIK shared with the user-process space.
It's possible in Sol2.6-7 to use priority-paging to alleviate the buffer- cache competition for process memory. Solaris 8 has a different memory model. Either way the problem doesn't go away, it just bites less hard and fast.
Now (I'm getting there!) VxFS, UFS (since Sol2.6), and raw-disk devices (or Disksuite metadevices or VxVM volumes) all allow for DirectIO or raw IO. The goal being *not* to double-cache database files.
The DBMS is better at cacheing database data and more system RAM can be used constructively by the DBMS if there's less buffer-cache competing. It also saves lots of double-copies between the buffer-cache and user process space. As an aside VxFS allows 'discovered' DirectIO which means Oracle's small writes can still cause a problem.
What I want to know, and what I have yet to have clarified for me, is:
- are NFS files buffer-cached? (I believe they are) and,
yes.
- is there anything that can be done to moderate the buffer-cache
competition (other than priority-paging), and/or alleviate the double-copying that this implies.
a) To improve NFS read performance, files and file attributes are cached. File modification times get updated whenever a write occurs. However, file access times may be temporarily out-of-date until the cache gets refreshed. The attribute cache retains file attributes on the client. Attributes for a file are assigned a time to be flushed. If the file is modified before the flush time, then the flush time is extended by the time since the last modification (under the assumption that files that changed recently are likely to change soon). There is a minimum and maximum flush time extension for regular files and for directories. Set- ting actimeo=n sets flush time to n seconds for both regular files and directories.
Setting actimeo=0 disables attribute caching on the client. This means that every reference to attributes will be satisfied directly from the server though file data will still be cached. While this guarantees that the client always has the latest file attributes from the server, it has an adverse effect on performance through additional latency, network load, and server load.
Setting the noac option also disables attribute caching, but has the further effect of disabling client write caching. While this guarantees that data written by an application will be written directly to a server, where it can be viewed immediately by other clients, it has a significant adverse effect on client write performance. Data written into memory-mapped file pages (mmap(2)) will not be written directly to this server.
b)
yes , by turning off NFS caching/DBMS caching you can save something and that *something* might be the 'cache-lookup' latency..!! and you may want to pick one which is favorable for the DBMS 'records' or 'querries'
and you may have to choose the optimal parameters for 'write/read buffer sizes' , tcp window ,etc.etc.. ( using 'ndd' and '/etc/system' ) ( refer to Adrien Coakcroft - solaris tuning)
Also, given that on Solaris I'm informed that nothing short of an obscure fcntl() call can ensure data is flushed from a file back to the server, just how do databases maintain their ACID transaction semantics? Do you need log sections on local disk regardless?
I feel I should know this stuff, being an old hand now on the list, but it occurred to me that I still don't have all the answers. Sorry if this was a little incoherent, not entirely sober and alert right now, I might have missed something obvious.
-- -Mark ... an Englishman in London ...