Don't forget that in normal circumstances on Linux, you're funneling this NFS traffic through the single RPC channel and the single TCP connection to the NetApp, even if you use multiple mount points. (I can't wait for pNFS to be finalized and fully implemented.)
I've attached some sysctl tweaks we put on our high-NFS (non-Oracle) linux systems that may work (though please test, as your mileage may vary, and not all of these may be appropriate for your environment). They may not be appropriate for an Oracle box, too, so please use caution. The changes that are probably the most safe are raising the limits and raising the sunrpc table values; you probably don't want to modify the TCP settings without consulting your DBAs or Oracle support.
Note that the last two sysctls have be done after the sunrpc kernel module loads, but before the nfs module loads in order to take effect. You might have to throw those two into an init script to get it to occur in the right order.
You also might want to do some sequential throughput tests with iozone (test 0 and test 1 and the -t flag I think) and multiple (4, 8, or more) processes with larger (4k+) block sizes; but even so, what you're seeing may be the upper end for that sort of tool.
These tweaks help a bit, but at least in terms of Oracle, we've found that:
1) Even with normal Linux NFS, Oracle spawns enough threads that in general it will get better iops and throughput than most dd (sequential) or iozone test operations. 2) Oracle seems to have enough of a mini-io-subsystem that it gets better efficiencies than an everyday command like dd operating on a mount point 3) If you can get your DBAs to look into using Oracle DirectNFS, your setup will scream; DirectNFS establishes TCP connections straight between Oracle and the NetApp, uses multiple connections, and has its own caching and IO subsystem that Oracle knows about and will benefit from. When we tested on trunked-1GB links (which I know don't add up to n times 1GB bandwidth), we ended up saturating interfaces; from what I understand DirectIO will do even better on a 10GB network. You can also use dual-networks (like dual-fabric SAN) to have Oracle load-balance properly over multiple links, instead of doing normal LACP trunking and having it only be able to push a single 10GB link's worth of bandwidth. At that point your bottleneck should be the disks behind the NetApp.
Good luck, and remember, test on a dev system first!
-dalvenjah
# NFS tweaks here # Raise generic socket memory useability, and start 'em big net.core.rmem_default=524288 net.core.wmem_default=524288 net.core.rmem_max=16777216 net.core.wmem_max=16777216 # Raise tcp memory useability too net.ipv4.tcp_rmem=4096 524288 16777216 net.ipv4.tcp_wmem=4096 524288 16777216 # raise the amount of memory for the fragmentation reassembly buffer # (if it goes above high_thresh, kernel starts tossing packets until usage # goes below low_thresh) net.ipv4.ipfrag_high_thresh=524288 net.ipv4.ipfrag_low_thresh=393216 # turn off tcp timestamps (extra CPU hit) since this is likely a # non-public server net.ipv4.tcp_timestamps=0 # make sure window scaling is on net.ipv4.tcp_window_scaling=1 # increase the number of option memory buffers net.core.optmem_max=524287 # raise the max backlog of packets on a net device net.core.netdev_max_backlog=2500 # max out the number of task request slots in the RPC code sunrpc.tcp_slot_table_entries=128 sunrpc.udp_slot_table_entries=128
On May 19, 2012, at 10:48 AM, Dan Burkland wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters