Howdy, all.
This is more a Solaris question, but since there's a filer involved and y'all are so helpful, I'll start with you guys. :-)
The setup: Sun E4500, Solaris 7 HW11/99+patches, GigE NetApp F760, 5.3.6R2, GigE Oracle 8.1.6.x Two volumes for Oracle, NFSv3/UDP, 32K rsize/wsize
The problem(s): I/O from any single Oracle process tops out around 3-4MB/sec;
Even with 'disk_asynch_io = false' in init.ora set to false, we are getting sporadic and seemingly nonsensical "resource temporarily unavailable" errors (had a case open for this one).
The question: Are any of the Solaris kernel NFS tunables useful in increasing that apparent limit? Specifically: nfs:nfs3_max_threads nfs:nfs3_async_clusters nfs:nfs3_nra
The DBA and our NetApp reps/support team have gone over our config, and things are generally stable. After turning off asynch_io we noticed a slight performance hit, but thought that the annoying "resource temporarily unavailable" errors had been solved. We had another one yesterday, first time in almost two months. :-/
In diagnosing the performance sluggishness, we noticed that for a non-partitioned full table scan a single Oracle "reader" process was limited to about 2.5-4MB/sec, while a full scan of a partitioned table (using 6 readers) was getting 18-22MB/sec. (Obvious workaround: increase the number of readers and writers, and partition the table with the bottleneck...)
In each case, Solaris iostat was claiming that the NFS mount was "100% busy", although response times were generally in the sub-6ms range and the filer's sysstat showed less than 20% cpu usage (this is a production box, so I can't completely isolate the load for our testing). For _reads_ I just can't figure out why Solaris - or Oracle itself - are throttling performance like this... I've clocked > 50MB/sec reads in plain old NFS tests (bonnie, cpio, etc), and I think it's generally clear when we've saturated the PCI bus on the filer. This Oracle bottleneck is a mystery, but now I'm wondering if the two problems are related.
The Solaris Tunable Parameters Reference Manual says of the "nfs:nfs3_max_threads" knob: "Controls the number of kernel threads that perform asynchronous I/O for the NFS version 3 _client_. [emphasis mine] Since NFS is based on RPC and RPC is inherently synchronous, separate execution contexts are required to perform NFS operations that are asynchronous from the calling thread."
The default pool of threads is set to 8. There's some indication that this may be a _per filesystem_ setting, and is not global to all NFS client activity on the machine. Even if that's the case, then 8 threads per mount point might still be fairly restrictive on an 8-cpu machine, given the fat pipe and headroom still available on the filer. I generally shy away from mucking about with Solaris internals ("this isn't your father's SunOS" :-) but if that's what it takes to boost performance I'm all for it.
Has anyone running in a similar environment had to tweak any Solaris kernel tunables? Are there other Oracle/NetApp tuning hints not discussed in the whitepapers on the NOW site? My DBA and I would be most grateful for any warnings/suggestions/hints. :-) I'll be happy to provide more detail off-list, then summarize results if anyone else is interested.
Thanks,
-- Chris
-- Chris Lamb, Unix Guy MeasureCast, Inc. 503-241-1469 x247 skeezics@measurecast.com