Howdy, all.
This is more a Solaris question, but since there's a filer involved and
y'all are so helpful, I'll start with you guys. :-)
The setup:
Sun E4500, Solaris 7 HW11/99+patches, GigE
NetApp F760, 5.3.6R2, GigE
Oracle 8.1.6.x
Two volumes for Oracle, NFSv3/UDP, 32K rsize/wsize
The problem(s):
I/O from any single Oracle process tops out around 3-4MB/sec;
Even with 'disk_asynch_io = false' in init.ora set to false, we
are getting sporadic and seemingly nonsensical "resource
temporarily unavailable" errors (had a case open for this one).
The question:
Are any of the Solaris kernel NFS tunables useful in increasing
that apparent limit? Specifically:
nfs:nfs3_max_threads
nfs:nfs3_async_clusters
nfs:nfs3_nra
The DBA and our NetApp reps/support team have gone over our config, and
things are generally stable. After turning off asynch_io we noticed a
slight performance hit, but thought that the annoying "resource
temporarily unavailable" errors had been solved. We had another one
yesterday, first time in almost two months. :-/
In diagnosing the performance sluggishness, we noticed that for a
non-partitioned full table scan a single Oracle "reader" process was
limited to about 2.5-4MB/sec, while a full scan of a partitioned table
(using 6 readers) was getting 18-22MB/sec. (Obvious workaround:
increase the number of readers and writers, and partition the table with
the bottleneck...)
In each case, Solaris iostat was claiming that the NFS mount was "100%
busy", although response times were generally in the sub-6ms range and the
filer's sysstat showed less than 20% cpu usage (this is a production box,
so I can't completely isolate the load for our testing). For _reads_ I
just can't figure out why Solaris - or Oracle itself - are throttling
performance like this... I've clocked > 50MB/sec reads in plain old NFS
tests (bonnie, cpio, etc), and I think it's generally clear when we've
saturated the PCI bus on the filer. This Oracle bottleneck is a mystery,
but now I'm wondering if the two problems are related.
The Solaris Tunable Parameters Reference Manual says of the
"nfs:nfs3_max_threads" knob:
"Controls the number of kernel threads that perform asynchronous
I/O for the NFS version 3 _client_. [emphasis mine] Since NFS is
based on RPC and RPC is inherently synchronous, separate execution
contexts are required to perform NFS operations that are
asynchronous from the calling thread."
The default pool of threads is set to 8. There's some indication that
this may be a _per filesystem_ setting, and is not global to all NFS
client activity on the machine. Even if that's the case, then 8 threads
per mount point might still be fairly restrictive on an 8-cpu machine,
given the fat pipe and headroom still available on the filer. I generally
shy away from mucking about with Solaris internals ("this isn't your
father's SunOS" :-) but if that's what it takes to boost performance I'm
all for it.
Has anyone running in a similar environment had to tweak any Solaris
kernel tunables? Are there other Oracle/NetApp tuning hints not discussed
in the whitepapers on the NOW site? My DBA and I would be most grateful
for any warnings/suggestions/hints. :-) I'll be happy to provide more
detail off-list, then summarize results if anyone else is interested.
Thanks,
-- Chris
--
Chris Lamb, Unix Guy
MeasureCast, Inc.
503-241-1469 x247
<skeezics(a)measurecast.com>