Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Your block size is only 1K; try increasing the block size and the throughput will increase. 1K IOs would generate a lot of IOPs with very little throughput.
-Robert
Sent from my iPhone
On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec=sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
Your block size is only 1K; try increasing the block size and the throughput will increase. 1K IOs would generate a lot of IOPs with very little throughput.
-Robert
Sent from my iPhone
On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
One more thing -- I know this may spawn debate, and I didn't do it terribly scientifically, but a year or two ago we tested jumbo frames with 1GB links, between a pair of 6070s and a couple of boxes with Intel server NICs in them, and we actually found performance to get worse (though not very much) with jumbo frames than with normal 1500-byte frames. It certainly didn't improve performance.
My theory is that all the ASIC TCP checksum offloading and such is so optimized for 1500-byte packets, that when you get up to the 9000-byte frame size, things have to go back to software, and you don't end up with a speed boost.
I could be wrong, but you might want to test with 1500-byte MTU set on both ends, just to see.
-dalvenjah
On May 19, 2012, at 11:46 AM, Dan Burkland wrote:
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec=sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
Your block size is only 1K; try increasing the block size and the throughput will increase. 1K IOs would generate a lot of IOPs with very little throughput.
-Robert
Sent from my iPhone
On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Good to know, I spent most of my day testing yesterday and noticed that Jumbo frames helped a little bit but not a whole lot. Here are my results:
1) Equipment & Lab Environment Configuration * NFS Volume = 1 x 25GB NFS share (aggr1) on 6080 #1 (oplocks disabled) * Mounted to test server with the RHEL default mount options which are: * (rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec=s ys) * Volume was mounted to "/mnt" on the test server
* Server Hardware = 1 x 3850 X5 with the following specs: * HOSTNAME: testserver * CPU: 2 x Intel E7540 @ 2.00GHz * RAM: 128GB * Local HDD: 2x146GB (RAID1) * 1GbE NIC: Intel Quad-port 1GbE 82580 * 10GbE NIC: Intel 10GbE x520 SFP+ PCIe 2.0 card
* Server OS * Version: RHEL 5.7+ with the "kernel-2.6.18-308.4.1.el5" (5.8) kernel - required by the Intel 10GbE NIC * Each NIC had two active ports which were combined to form an active/passive bond known as "bond1"
3) testserver to 6080 #1 Network Throughput Test with 5GB file creation (dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880) * 1GbE w/o Jumbo Frames (old storage network that talks to 2x1GbE interfaces on 6080 #1) = 5368709120 bytes (5.4 GB) copied, 99.0798 seconds, 54.2 MB/s * 1GbE w/o Jumbo Frames (new storage network that talks to 2x10GbE interfaces on 6080 #1) = 5368709120 bytes (5.4 GB) copied, 70.8844 seconds, 75.7 MB/s * 1GbE + Jumbo Frames = 5368709120 bytes (5.4 GB) copied, 67.1208 seconds, 80.0 MB/s * 10GbE w/0 Jumbo Frames = 5368709120 bytes (5.4 GB) copied, 45.9469 seconds, 117 MB/s * 10GbE + Jumbo Frames = 5368709120 bytes (5.4 GB) copied, 38.8961 seconds, 138 MB/s
4) testserver to 6080 #1 Network Throughput Test + RHEL OS tweaking with 5GB file creation (dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880) * 10GbE + Jumbo Frames with kernel defaults plus: * sunrpc kernel module parameter change = 5368709120 bytes (5.4 GB) copied, 39.0274 seconds, 138 MB/s * Echoed "options sunrpc tcp_slot_table_entries=128" to "/etc/modprobe.d/sunrpc.conf" according to https://access.redhat.com/knowledge/solutions/69275 & rebooted * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "net.core.rmem_default = 262144" = 5368709120 bytes (5.4 GB) copied, 39.0149 seconds, 138 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "net.core.rmem_max = 16777216" = 5368709120 bytes (5.4 GB) copied, 32.7018 seconds, 164 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "net.core.wmem_default = 262144" = 5368709120 bytes (5.4 GB) copied, 33.245 seconds, 161 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "net.core.wmem_max = 16777216" = 5368709120 bytes (5.4 GB) copied, 33.2526 seconds, 161 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "net.ipv4.tcp_rmem = 4096 262144 16777216" = 5368709120 bytes (5.4 GB) copied, 35.2615 seconds, 152 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "net.ipv4.tcp_wmem = 4096 262144 16777216" = 5368709120 bytes (5.4 GB) copied, 33.0321 seconds, 163 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "net.ipv4.tcp_window_scaling = 1" = 5368709120 bytes (5.4 GB) copied, 33.5698 seconds, 160 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "net.ipv4.tcp_syncookies = 0" = 5368709120 bytes (5.4 GB) copied, 32.7373 seconds, 164 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "net.ipv4.tcp_timestamps = 0" = 5368709120 bytes (5.4 GB) copied, 34.0019 seconds, 158 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "net.ipv4.tcp_sack = 0" = 5368709120 bytes (5.4 GB) copied, 35.3956 seconds, 152 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "cpuspeed" service stopped = 5368709120 bytes (5.4 GB) copied, 32.8168 seconds, 164 MB/s * 10GbE + Jumbo Frames with kernel defaults, previous change(s), plus: * "irqbalance" service stopped = 5368709120 bytes (5.4 GB) copied, 32.6841 seconds, 164 MB/s
Regarding the test server itself, nothing is running on it except for my test processes (it is a test box that has the exact same hardware configuration as the Oracle EBS DB servers)
Regards,
Dan Burkland
On 5/19/12 2:01 PM, "Dalvenjah FoxFire" dalvenjah@DAL.NET wrote:
One more thing -- I know this may spawn debate, and I didn't do it terribly scientifically, but a year or two ago we tested jumbo frames with 1GB links, between a pair of 6070s and a couple of boxes with Intel server NICs in them, and we actually found performance to get worse (though not very much) with jumbo frames than with normal 1500-byte frames. It certainly didn't improve performance.
My theory is that all the ASIC TCP checksum offloading and such is so optimized for 1500-byte packets, that when you get up to the 9000-byte frame size, things have to go back to software, and you don't end up with a speed boost.
I could be wrong, but you might want to test with 1500-byte MTU set on both ends, just to see.
-dalvenjah
On May 19, 2012, at 11:46 AM, Dan Burkland wrote:
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec= sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
Your block size is only 1K; try increasing the block size and the throughput will increase. 1K IOs would generate a lot of IOPs with very little throughput.
-Robert
Sent from my iPhone
On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG wrote:
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec=sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
Your block size is only 1K; try increasing the block size and the throughput will increase. 1K IOs would generate a lot of IOPs with very little throughput.
-Robert
Sent from my iPhone
On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Jeff Mother - Which specific setting are you referring to?
I installed iozone on my test machine and am currently running the following iozone command on it:
iozone -s 5g -r 1m -t 16 -R -b /root/iozone_testserver_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
I'll post the results once it is finished
Dan
On 5/19/12 2:44 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG wrote:
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec= sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
Your block size is only 1K; try increasing the block size and the throughput will increase. 1K IOs would generate a lot of IOPs with very little throughput.
-Robert
Sent from my iPhone
On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Here are the IOZone results:
Run began: Sat May 19 16:22:46 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b /root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 371306.91 KB/sec Parent sees throughput for 16 initial writers = 167971.82 KB/sec Min throughput per process = 21901.84 KB/sec Max throughput per process = 25333.62 KB/sec Avg throughput per process = 23206.68 KB/sec Min xfer = 4533248.00 KB
Children see throughput for 16 rewriters = 350486.11 KB/sec Parent sees throughput for 16 rewriters = 176947.47 KB/sec Min throughput per process = 21154.26 KB/sec Max throughput per process = 23011.69 KB/sec Avg throughput per process = 21905.38 KB/sec Min xfer = 4819968.00 KB
362MB/s looks quite a bit higher however can somebody validate that I am reading these results correctly? Should I also run "iozone" with the -a (auto) option for good measure?
Thanks again for all of your responses, I greatly appreciate it!
Dan
On 5/19/12 4:36 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
Jeff Mother - Which specific setting are you referring to?
I installed iozone on my test machine and am currently running the following iozone command on it:
iozone -s 5g -r 1m -t 16 -R -b /root/iozone_testserver_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
I'll post the results once it is finished
Dan
On 5/19/12 2:44 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG wrote:
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec = sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
Your block size is only 1K; try increasing the block size and the throughput will increase. 1K IOs would generate a lot of IOPs with very little throughput.
-Robert
Sent from my iPhone
On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
You're now approaching storage write saturation for your box on writes at that rate.
Pull reads now.
Sent from my iPhone
On May 19, 2012, at 2:43 PM, Dan Burkland dburklan@NMDP.ORG wrote:
Here are the IOZone results:
Run began: Sat May 19 16:22:46 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b /root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 371306.91 KB/sec Parent sees throughput for 16 initial writers = 167971.82 KB/sec Min throughput per process = 21901.84 KB/sec Max throughput per process = 25333.62 KB/sec Avg throughput per process = 23206.68 KB/sec Min xfer = 4533248.00 KB
Children see throughput for 16 rewriters = 350486.11 KB/sec Parent sees throughput for 16 rewriters = 176947.47 KB/sec Min throughput per process = 21154.26 KB/sec Max throughput per process = 23011.69 KB/sec Avg throughput per process = 21905.38 KB/sec Min xfer = 4819968.00 KB
362MB/s looks quite a bit higher however can somebody validate that I am reading these results correctly? Should I also run "iozone" with the -a (auto) option for good measure?
Thanks again for all of your responses, I greatly appreciate it!
Dan
On 5/19/12 4:36 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
Jeff Mother - Which specific setting are you referring to?
I installed iozone on my test machine and am currently running the following iozone command on it:
iozone -s 5g -r 1m -t 16 -R -b /root/iozone_testserver_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
I'll post the results once it is finished
Dan
On 5/19/12 2:44 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG wrote:
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec = sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
Your block size is only 1K; try increasing the block size and the throughput will increase. 1K IOs would generate a lot of IOPs with very little throughput.
-Robert
Sent from my iPhone
On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
I unmounted the NFS share and rebooted the box before running the same "iozone" command again. This time I let "iozone" run through all of its test (including the read-based ones)
Run began: Sat May 19 16:46:27 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b /root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 349500.55 KB/sec Parent sees throughput for 16 initial writers = 173837.26 KB/sec Min throughput per process = 21147.24 KB/sec Max throughput per process = 22701.06 KB/sec Avg throughput per process = 21843.78 KB/sec Min xfer = 4884480.00 KB
Children see throughput for 16 rewriters = 372333.90 KB/sec Parent sees throughput for 16 rewriters = 179256.38 KB/sec Min throughput per process = 22495.20 KB/sec Max throughput per process = 24418.89 KB/sec Avg throughput per process = 23270.87 KB/sec Min xfer = 4830208.00 KB
Children see throughput for 16 readers = 440115.98 KB/sec Parent sees throughput for 16 readers = 439993.44 KB/sec Min throughput per process = 26406.17 KB/sec Max throughput per process = 28724.05 KB/sec Avg throughput per process = 27507.25 KB/sec Min xfer = 4819968.00 KB
Children see throughput for 16 re-readers = 8953522.06 KB/sec Parent sees throughput for 16 re-readers = 8930475.33 KB/sec Min throughput per process = 408033.34 KB/sec Max throughput per process = 671821.62 KB/sec Avg throughput per process = 559595.13 KB/sec Min xfer = 3186688.00 KB
Children see throughput for 16 reverse readers = 5543829.37 KB/sec Parent sees throughput for 16 reverse readers = 5425986.47 KB/sec Min throughput per process = 15684.29 KB/sec Max throughput per process = 2261884.25 KB/sec Avg throughput per process = 346489.34 KB/sec Min xfer = 36864.00 KB
Children see throughput for 16 stride readers = 16532117.19 KB/sec Parent sees throughput for 16 stride readers = 16272131.55 KB/sec Min throughput per process = 257097.92 KB/sec Max throughput per process = 2256125.75 KB/sec Avg throughput per process = 1033257.32 KB/sec Min xfer = 602112.00 KB
Children see throughput for 16 random readers = 17297437.81 KB/sec Parent sees throughput for 16 random readers = 16871312.92 KB/sec Min throughput per process = 320909.25 KB/sec Max throughput per process = 2083737.75 KB/sec Avg throughput per process = 1081089.86 KB/sec Min xfer = 826368.00 KB
Children see throughput for 16 mixed workload = 10747970.97 KB/sec Parent sees throughput for 16 mixed workload = 112898.07 KB/sec Min throughput per process = 54960.62 KB/sec Max throughput per process = 1991637.38 KB/sec Avg throughput per process = 671748.19 KB/sec Min xfer = 145408.00 KB
Children see throughput for 16 random writers = 358103.29 KB/sec Parent sees throughput for 16 random writers = 166805.09 KB/sec Min throughput per process = 21263.60 KB/sec Max throughput per process = 22942.70 KB/sec Avg throughput per process = 22381.46 KB/sec Min xfer = 4859904.00 KB
Children see throughput for 16 pwrite writers = 325666.64 KB/sec Parent sees throughput for 16 pwrite writers = 177771.50 KB/sec Min throughput per process = 19902.90 KB/sec Max throughput per process = 20863.29 KB/sec Avg throughput per process = 20354.17 KB/sec Min xfer = 5008384.00 KB
Children see throughput for 16 pread readers = 445021.47 KB/sec Parent sees throughput for 16 pread readers = 444618.25 KB/sec Min throughput per process = 26932.47 KB/sec Max throughput per process = 28361.61 KB/sec Avg throughput per process = 27813.84 KB/sec Min xfer = 4981760.00 KB
"Throughput report Y-axis is type of test X-axis is number of processes" "Record size = 1024 Kbytes " "Output is in Kbytes/sec"
" Initial write " 349500.55
" Rewrite " 372333.90
" Read " 440115.98
" Re-read " 8953522.06
" Reverse Read " 5543829.37
" Stride read " 16532117.19
" Random read " 17297437.81
" Mixed workload " 10747970.97
" Random write " 358103.29
" Pwrite " 325666.64
" Pread " 445021.47
Regards,
Dan
On 5/19/12 4:48 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
You're now approaching storage write saturation for your box on writes at that rate.
Pull reads now.
Sent from my iPhone
On May 19, 2012, at 2:43 PM, Dan Burkland dburklan@NMDP.ORG wrote:
Here are the IOZone results:
Run began: Sat May 19 16:22:46 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b /root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 371306.91 KB/sec Parent sees throughput for 16 initial writers = 167971.82 KB/sec Min throughput per process = 21901.84 KB/sec Max throughput per process = 25333.62 KB/sec Avg throughput per process = 23206.68 KB/sec Min xfer = 4533248.00 KB
Children see throughput for 16 rewriters = 350486.11 KB/sec Parent sees throughput for 16 rewriters = 176947.47 KB/sec Min throughput per process = 21154.26 KB/sec Max throughput per process = 23011.69 KB/sec Avg throughput per process = 21905.38 KB/sec Min xfer = 4819968.00 KB
362MB/s looks quite a bit higher however can somebody validate that I am reading these results correctly? Should I also run "iozone" with the -a (auto) option for good measure?
Thanks again for all of your responses, I greatly appreciate it!
Dan
On 5/19/12 4:36 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
Jeff Mother - Which specific setting are you referring to?
I installed iozone on my test machine and am currently running the following iozone command on it:
iozone -s 5g -r 1m -t 16 -R -b /root/iozone_testserver_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
I'll post the results once it is finished
Dan
On 5/19/12 2:44 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG wrote:
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,s ec = sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
Your block size is only 1K; try increasing the block size and the throughput will increase. 1K IOs would generate a lot of IOPs with very little throughput.
-Robert
Sent from my iPhone
On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote:
> Hi all, > > My company just bought some Intel x520 10GbE cards which I recently > installed into our Oracle EBS database servers (IBM 3850 X5s >running > RHEL > 5.8). As the "linux guy" I have been tasked with getting these > servers > to > communicate with our NetApp 6080s via NFS over the new 10GbE >links. I > have > got everything working however ever after tuning the RHEL kernel I >am > only > getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile > bs=1024 > count=5242880" command. For you folks that run 10GbE to your > toasters, > what write speeds are you seeing from your 10GbE connected servers? > Did > you have to do any tuning in order to get the best results >possible? > If > so > what did you change? > > Thanks! > > Dan > > > > _______________________________________________ > Toasters mailing list > Toasters@teaparty.net > http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
In regards to the latest "iozone" results, are these more in the ball park of what I should be seeing? Also why is the re-read throughput value roughly 20x that of the initial read speed? Would this be caching on the NFS client side or some sort of caching done by the PAM card on the 6080? (Should I be running these tests with the "-I" or "Direct IO" argument to bypass any possible local caching mechanisms?"
Thanks again!
Dan
On 5/19/12 5:32 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
I unmounted the NFS share and rebooted the box before running the same "iozone" command again. This time I let "iozone" run through all of its test (including the read-based ones)
Run began: Sat May 19 16:46:27 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b
/root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 349500.55
KB/sec Parent sees throughput for 16 initial writers = 173837.26 KB/sec Min throughput per process = 21147.24 KB/sec Max throughput per process = 22701.06 KB/sec Avg throughput per process = 21843.78 KB/sec Min xfer = 4884480.00 KB
Children see throughput for 16 rewriters = 372333.90
KB/sec Parent sees throughput for 16 rewriters = 179256.38 KB/sec Min throughput per process = 22495.20 KB/sec Max throughput per process = 24418.89 KB/sec Avg throughput per process = 23270.87 KB/sec Min xfer = 4830208.00 KB
Children see throughput for 16 readers = 440115.98
KB/sec Parent sees throughput for 16 readers = 439993.44 KB/sec Min throughput per process = 26406.17 KB/sec Max throughput per process = 28724.05 KB/sec Avg throughput per process = 27507.25 KB/sec Min xfer = 4819968.00 KB
Children see throughput for 16 re-readers = 8953522.06
KB/sec Parent sees throughput for 16 re-readers = 8930475.33 KB/sec Min throughput per process = 408033.34 KB/sec Max throughput per process = 671821.62 KB/sec Avg throughput per process = 559595.13 KB/sec Min xfer = 3186688.00 KB
Children see throughput for 16 reverse readers = 5543829.37
KB/sec Parent sees throughput for 16 reverse readers = 5425986.47 KB/sec Min throughput per process = 15684.29 KB/sec Max throughput per process = 2261884.25 KB/sec Avg throughput per process = 346489.34 KB/sec Min xfer = 36864.00 KB
Children see throughput for 16 stride readers = 16532117.19
KB/sec Parent sees throughput for 16 stride readers = 16272131.55 KB/sec Min throughput per process = 257097.92 KB/sec Max throughput per process = 2256125.75 KB/sec Avg throughput per process = 1033257.32 KB/sec Min xfer = 602112.00 KB
Children see throughput for 16 random readers = 17297437.81
KB/sec Parent sees throughput for 16 random readers = 16871312.92 KB/sec Min throughput per process = 320909.25 KB/sec Max throughput per process = 2083737.75 KB/sec Avg throughput per process = 1081089.86 KB/sec Min xfer = 826368.00 KB
Children see throughput for 16 mixed workload = 10747970.97
KB/sec Parent sees throughput for 16 mixed workload = 112898.07 KB/sec Min throughput per process = 54960.62 KB/sec Max throughput per process = 1991637.38 KB/sec Avg throughput per process = 671748.19 KB/sec Min xfer = 145408.00 KB
Children see throughput for 16 random writers = 358103.29
KB/sec Parent sees throughput for 16 random writers = 166805.09 KB/sec Min throughput per process = 21263.60 KB/sec Max throughput per process = 22942.70 KB/sec Avg throughput per process = 22381.46 KB/sec Min xfer = 4859904.00 KB
Children see throughput for 16 pwrite writers = 325666.64
KB/sec Parent sees throughput for 16 pwrite writers = 177771.50 KB/sec Min throughput per process = 19902.90 KB/sec Max throughput per process = 20863.29 KB/sec Avg throughput per process = 20354.17 KB/sec Min xfer = 5008384.00 KB
Children see throughput for 16 pread readers = 445021.47
KB/sec Parent sees throughput for 16 pread readers = 444618.25 KB/sec Min throughput per process = 26932.47 KB/sec Max throughput per process = 28361.61 KB/sec Avg throughput per process = 27813.84 KB/sec Min xfer = 4981760.00 KB
"Throughput report Y-axis is type of test X-axis is number of processes" "Record size = 1024 Kbytes " "Output is in Kbytes/sec"
" Initial write " 349500.55
" Rewrite " 372333.90
" Read " 440115.98
" Re-read " 8953522.06
" Reverse Read " 5543829.37
" Stride read " 16532117.19
" Random read " 17297437.81
" Mixed workload " 10747970.97
" Random write " 358103.29
" Pwrite " 325666.64
" Pread " 445021.47
Regards,
Dan
On 5/19/12 4:48 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
You're now approaching storage write saturation for your box on writes at that rate.
Pull reads now.
Sent from my iPhone
On May 19, 2012, at 2:43 PM, Dan Burkland dburklan@NMDP.ORG wrote:
Here are the IOZone results:
Run began: Sat May 19 16:22:46 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b /root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 371306.91 KB/sec Parent sees throughput for 16 initial writers = 167971.82 KB/sec Min throughput per process = 21901.84 KB/sec Max throughput per process = 25333.62 KB/sec Avg throughput per process = 23206.68 KB/sec Min xfer = 4533248.00 KB
Children see throughput for 16 rewriters = 350486.11 KB/sec Parent sees throughput for 16 rewriters = 176947.47 KB/sec Min throughput per process = 21154.26 KB/sec Max throughput per process = 23011.69 KB/sec Avg throughput per process = 21905.38 KB/sec Min xfer = 4819968.00 KB
362MB/s looks quite a bit higher however can somebody validate that I am reading these results correctly? Should I also run "iozone" with the -a (auto) option for good measure?
Thanks again for all of your responses, I greatly appreciate it!
Dan
On 5/19/12 4:36 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
Jeff Mother - Which specific setting are you referring to?
I installed iozone on my test machine and am currently running the following iozone command on it:
iozone -s 5g -r 1m -t 16 -R -b /root/iozone_testserver_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
I'll post the results once it is finished
Dan
On 5/19/12 2:44 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG wrote:
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2, s ec = sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
> Your block size is only 1K; try increasing the block size and the > throughput will increase. 1K IOs would generate a lot of IOPs with > very > little throughput. > > -Robert > > Sent from my iPhone > > On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote: > >> Hi all, >> >> My company just bought some Intel x520 10GbE cards which I >>recently >> installed into our Oracle EBS database servers (IBM 3850 X5s >>running >> RHEL >> 5.8). As the "linux guy" I have been tasked with getting these >> servers >> to >> communicate with our NetApp 6080s via NFS over the new 10GbE >>links. I >> have >> got everything working however ever after tuning the RHEL kernel I >>am >> only >> getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile >> bs=1024 >> count=5242880" command. For you folks that run 10GbE to your >> toasters, >> what write speeds are you seeing from your 10GbE connected >>servers? >> Did >> you have to do any tuning in order to get the best results >>possible? >> If >> so >> what did you change? >> >> Thanks! >> >> Dan >> >> >> >> _______________________________________________ >> Toasters mailing list >> Toasters@teaparty.net >> http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Re-read is from:
Host system cache Netapp system cache (or pam)
Direct will bypass host caching..yup.
On Sun, May 20, 2012 at 2:12 PM, Dan Burkland dburklan@nmdp.org wrote:
In regards to the latest "iozone" results, are these more in the ball park of what I should be seeing? Also why is the re-read throughput value roughly 20x that of the initial read speed? Would this be caching on the NFS client side or some sort of caching done by the PAM card on the 6080? (Should I be running these tests with the "-I" or "Direct IO" argument to bypass any possible local caching mechanisms?"
Thanks again!
Dan
On 5/19/12 5:32 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
I unmounted the NFS share and rebooted the box before running the same "iozone" command again. This time I let "iozone" run through all of its test (including the read-based ones)
Run began: Sat May 19 16:46:27 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b
/root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 349500.55
KB/sec Parent sees throughput for 16 initial writers = 173837.26 KB/sec Min throughput per process = 21147.24 KB/sec Max throughput per process = 22701.06 KB/sec Avg throughput per process = 21843.78 KB/sec Min xfer = 4884480.00 KB
Children see throughput for 16 rewriters = 372333.90
KB/sec Parent sees throughput for 16 rewriters = 179256.38 KB/sec Min throughput per process = 22495.20 KB/sec Max throughput per process = 24418.89 KB/sec Avg throughput per process = 23270.87 KB/sec Min xfer = 4830208.00 KB
Children see throughput for 16 readers = 440115.98
KB/sec Parent sees throughput for 16 readers = 439993.44 KB/sec Min throughput per process = 26406.17 KB/sec Max throughput per process = 28724.05 KB/sec Avg throughput per process = 27507.25 KB/sec Min xfer = 4819968.00 KB
Children see throughput for 16 re-readers = 8953522.06
KB/sec Parent sees throughput for 16 re-readers = 8930475.33 KB/sec Min throughput per process = 408033.34 KB/sec Max throughput per process = 671821.62 KB/sec Avg throughput per process = 559595.13 KB/sec Min xfer = 3186688.00 KB
Children see throughput for 16 reverse readers = 5543829.37
KB/sec Parent sees throughput for 16 reverse readers = 5425986.47 KB/sec Min throughput per process = 15684.29 KB/sec Max throughput per process = 2261884.25 KB/sec Avg throughput per process = 346489.34 KB/sec Min xfer = 36864.00 KB
Children see throughput for 16 stride readers = 16532117.19
KB/sec Parent sees throughput for 16 stride readers = 16272131.55 KB/sec Min throughput per process = 257097.92 KB/sec Max throughput per process = 2256125.75 KB/sec Avg throughput per process = 1033257.32 KB/sec Min xfer = 602112.00 KB
Children see throughput for 16 random readers = 17297437.81
KB/sec Parent sees throughput for 16 random readers = 16871312.92 KB/sec Min throughput per process = 320909.25 KB/sec Max throughput per process = 2083737.75 KB/sec Avg throughput per process = 1081089.86 KB/sec Min xfer = 826368.00 KB
Children see throughput for 16 mixed workload = 10747970.97
KB/sec Parent sees throughput for 16 mixed workload = 112898.07 KB/sec Min throughput per process = 54960.62 KB/sec Max throughput per process = 1991637.38 KB/sec Avg throughput per process = 671748.19 KB/sec Min xfer = 145408.00 KB
Children see throughput for 16 random writers = 358103.29
KB/sec Parent sees throughput for 16 random writers = 166805.09 KB/sec Min throughput per process = 21263.60 KB/sec Max throughput per process = 22942.70 KB/sec Avg throughput per process = 22381.46 KB/sec Min xfer = 4859904.00 KB
Children see throughput for 16 pwrite writers = 325666.64
KB/sec Parent sees throughput for 16 pwrite writers = 177771.50 KB/sec Min throughput per process = 19902.90 KB/sec Max throughput per process = 20863.29 KB/sec Avg throughput per process = 20354.17 KB/sec Min xfer = 5008384.00 KB
Children see throughput for 16 pread readers = 445021.47
KB/sec Parent sees throughput for 16 pread readers = 444618.25 KB/sec Min throughput per process = 26932.47 KB/sec Max throughput per process = 28361.61 KB/sec Avg throughput per process = 27813.84 KB/sec Min xfer = 4981760.00 KB
"Throughput report Y-axis is type of test X-axis is number of processes" "Record size = 1024 Kbytes " "Output is in Kbytes/sec"
" Initial write " 349500.55
" Rewrite " 372333.90
" Read " 440115.98
" Re-read " 8953522.06
" Reverse Read " 5543829.37
" Stride read " 16532117.19
" Random read " 17297437.81
" Mixed workload " 10747970.97
" Random write " 358103.29
" Pwrite " 325666.64
" Pread " 445021.47
Regards,
Dan
On 5/19/12 4:48 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
You're now approaching storage write saturation for your box on writes at that rate.
Pull reads now.
Sent from my iPhone
On May 19, 2012, at 2:43 PM, Dan Burkland dburklan@NMDP.ORG wrote:
Here are the IOZone results:
Run began: Sat May 19 16:22:46 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b /root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 371306.91 KB/sec Parent sees throughput for 16 initial writers = 167971.82 KB/sec Min throughput per process = 21901.84 KB/sec Max throughput per process = 25333.62 KB/sec Avg throughput per process = 23206.68 KB/sec Min xfer = 4533248.00 KB
Children see throughput for 16 rewriters = 350486.11 KB/sec Parent sees throughput for 16 rewriters = 176947.47 KB/sec Min throughput per process = 21154.26 KB/sec Max throughput per process = 23011.69 KB/sec Avg throughput per process = 21905.38 KB/sec Min xfer = 4819968.00 KB
362MB/s looks quite a bit higher however can somebody validate that I am reading these results correctly? Should I also run "iozone" with the -a (auto) option for good measure?
Thanks again for all of your responses, I greatly appreciate it!
Dan
On 5/19/12 4:36 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
Jeff Mother - Which specific setting are you referring to?
I installed iozone on my test machine and am currently running the following iozone command on it:
iozone -s 5g -r 1m -t 16 -R -b /root/iozone_testserver_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
I'll post the results once it is finished
Dan
On 5/19/12 2:44 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG
wrote:
> I know dd isn't the best tool since it is a single threaded >application > and in no way represents the workload that Oracle will impose. >However, > I > thought it would still give me a decent ballpark figure regarding > throughput. I tried a block size of 64k, 128k, and 1M (just to see) >and > got a bit more promising results: > > # dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s > > If I run two of these dd sessions at once the throughput figure >above > gets > cut in half (each dd session reports it creates the file at around > 100MB/s). > > As far as the switch goes, I have not checked it yet however I did > notice > that flow control is set to full on the 6080 10GbE interfaces. We >are > also > running Jumbo Frames on all of the involved equipment. > > As far as the RHEL OS tweaks go, here are the settings that I have > changed > on the system: > > ### > /etc/sysctl.conf: > > # 10GbE Kernel Parameters > net.core.rmem_default = 262144 > net.core.rmem_max = 16777216 > net.core.wmem_default = 262144 > net.core.wmem_max = 16777216 > net.ipv4.tcp_rmem = 4096 262144 16777216 > net.ipv4.tcp_wmem = 4096 262144 16777216 > net.ipv4.tcp_window_scaling = 1 > net.ipv4.tcp_syncookies = 0 > net.ipv4.tcp_timestamps = 0 > net.ipv4.tcp_sack = 0 > # > > ### > > ### > /etc/modprobe.d/sunrpc.conf: > > > options sunrpc tcp_slot_table_entries=128 > > ### > > > ### > Mount options for the NetApp test NFS share: > > > >rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2, >s >ec > = > sy > s > > ### > > Thanks again for all of your quick and detailed responses! > > > Dan > > > > On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote: > >> Your block size is only 1K; try increasing the block size and the >> throughput will increase. 1K IOs would generate a lot of IOPs with >> very >> little throughput. >> >> -Robert >> >> Sent from my iPhone >> >> On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote: >> >>> Hi all, >>> >>> My company just bought some Intel x520 10GbE cards which I >>>recently >>> installed into our Oracle EBS database servers (IBM 3850 X5s >>>running >>> RHEL >>> 5.8). As the "linux guy" I have been tasked with getting these >>> servers >>> to >>> communicate with our NetApp 6080s via NFS over the new 10GbE >>>links. I >>> have >>> got everything working however ever after tuning the RHEL kernel I >>>am >>> only >>> getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile >>> bs=1024 >>> count=5242880" command. For you folks that run 10GbE to your >>> toasters, >>> what write speeds are you seeing from your 10GbE connected >>>servers? >>> Did >>> you have to do any tuning in order to get the best results >>>possible? >>> If >>> so >>> what did you change? >>> >>> Thanks! >>> >>> Dan >>> >>> >>> >>> _______________________________________________ >>> Toasters mailing list >>> Toasters@teaparty.net >>> http://www.teaparty.net/mailman/listinfo/toasters > > > _______________________________________________ > Toasters mailing list > Toasters@teaparty.net > http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Here are the results with Direct I/O enabled:
Run began: Sun May 20 16:21:12 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled O_DIRECT feature enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b /root/iozone_mn4s31063_2012-05-d.csv -I -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 262467.29 KB/sec Parent sees throughput for 16 initial writers = 260324.76 KB/sec Min throughput per process = 16309.72 KB/sec Max throughput per process = 16546.15 KB/sec Avg throughput per process = 16404.21 KB/sec Min xfer = 5168128.00 KB
Children see throughput for 16 rewriters = 251104.65 KB/sec Parent sees throughput for 16 rewriters = 251090.95 KB/sec Min throughput per process = 15546.73 KB/sec Max throughput per process = 15832.99 KB/sec Avg throughput per process = 15694.04 KB/sec Min xfer = 5148672.00 KB
Children see throughput for 16 readers = 619751.30 KB/sec Parent sees throughput for 16 readers = 619581.97 KB/sec Min throughput per process = 36595.70 KB/sec Max throughput per process = 39467.45 KB/sec Avg throughput per process = 38734.46 KB/sec Min xfer = 4861952.00 KB
Children see throughput for 16 re-readers = 626421.73 KB/sec Parent sees throughput for 16 re-readers = 626354.38 KB/sec Min throughput per process = 37853.47 KB/sec Max throughput per process = 40021.52 KB/sec Avg throughput per process = 39151.36 KB/sec Min xfer = 4959232.00 KB
Children see throughput for 16 reverse readers = 462712.64 KB/sec Parent sees throughput for 16 reverse readers = 462649.29 KB/sec Min throughput per process = 27713.84 KB/sec Max throughput per process = 29794.67 KB/sec Avg throughput per process = 28919.54 KB/sec Min xfer = 4877312.00 KB
Children see throughput for 16 stride readers = 520482.83 KB/sec Parent sees throughput for 16 stride readers = 520448.31 KB/sec Min throughput per process = 31892.69 KB/sec Max throughput per process = 33016.53 KB/sec Avg throughput per process = 32530.18 KB/sec Min xfer = 5064704.00 KB
Children see throughput for 16 random readers = 544089.98 KB/sec Parent sees throughput for 16 random readers = 544055.32 KB/sec Min throughput per process = 33799.79 KB/sec Max throughput per process = 34304.76 KB/sec Avg throughput per process = 34005.62 KB/sec Min xfer = 5166080.00 KB
Children see throughput for 16 mixed workload = 365865.06 KB/sec Parent sees throughput for 16 mixed workload = 352394.93 KB/sec Min throughput per process = 22250.01 KB/sec Max throughput per process = 23576.78 KB/sec Avg throughput per process = 22866.57 KB/sec Min xfer = 4947968.00 KB
Children see throughput for 16 random writers = 230192.41 KB/sec Parent sees throughput for 16 random writers = 229237.34 KB/sec Min throughput per process = 14307.92 KB/sec Max throughput per process = 14463.50 KB/sec Avg throughput per process = 14387.03 KB/sec Min xfer = 5186560.00 KB
Children see throughput for 16 pwrite writers = 197020.59 KB/sec Parent sees throughput for 16 pwrite writers = 195973.16 KB/sec Min throughput per process = 12265.62 KB/sec Max throughput per process = 12394.86 KB/sec Avg throughput per process = 12313.79 KB/sec Min xfer = 5188608.00 KB
Children see throughput for 16 pread readers = 578525.04 KB/sec Parent sees throughput for 16 pread readers = 578418.73 KB/sec Min throughput per process = 33046.61 KB/sec Max throughput per process = 38253.89 KB/sec Avg throughput per process = 36157.82 KB/sec Min xfer = 4530176.00 KB
"Throughput report Y-axis is type of test X-axis is number of processes" "Record size = 1024 Kbytes " "Output is in Kbytes/sec"
" Initial write " 262467.29
" Rewrite " 251104.65
" Read " 619751.30
" Re-read " 626421.73
" Reverse Read " 462712.64
" Stride read " 520482.83
" Random read " 544089.98
" Mixed workload " 365865.06
" Random write " 230192.41
" Pwrite " 197020.59
" Pread " 578525.04
The read results definitely look more believable now. Are these results more in line with what I should be seeing? Tomorrow I am going to try and rule the switches out of the equation by running "netperf" between my two 10GbE test systems.
Dan
From: Jeff Mohler speedtoys.racing@gmail.com Date: Sun, 20 May 2012 16:17:41 -0500 To: Dan Burkland dburklan@nmdp.org Cc: "toasters@teaparty.net" toasters@teaparty.net Subject: Re: Poor NFS 10GbE performance on NetApp 6080s
Re-read is from:
Host system cache Netapp system cache (or pam)
Direct will bypass host caching..yup.
On Sun, May 20, 2012 at 2:12 PM, Dan Burkland dburklan@nmdp.org wrote:
In regards to the latest "iozone" results, are these more in the ball park of what I should be seeing? Also why is the re-read throughput value roughly 20x that of the initial read speed? Would this be caching on the NFS client side or some sort of caching done by the PAM card on the 6080? (Should I be running these tests with the "-I" or "Direct IO" argument to bypass any possible local caching mechanisms?"
Thanks again!
Dan
On 5/19/12 5:32 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
I unmounted the NFS share and rebooted the box before running the same "iozone" command again. This time I let "iozone" run through all of its test (including the read-based ones)
Run began: Sat May 19 16:46:27 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b
/root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 349500.55
KB/sec Parent sees throughput for 16 initial writers = 173837.26 KB/sec Min throughput per process = 21147.24 KB/sec Max throughput per process = 22701.06 KB/sec Avg throughput per process = 21843.78 KB/sec Min xfer = 4884480.00 KB
Children see throughput for 16 rewriters = 372333.90
KB/sec Parent sees throughput for 16 rewriters = 179256.38 KB/sec Min throughput per process = 22495.20 KB/sec Max throughput per process = 24418.89 KB/sec Avg throughput per process = 23270.87 KB/sec Min xfer = 4830208.00 KB
Children see throughput for 16 readers = 440115.98
KB/sec Parent sees throughput for 16 readers = 439993.44 KB/sec Min throughput per process = 26406.17 KB/sec Max throughput per process = 28724.05 KB/sec Avg throughput per process = 27507.25 KB/sec Min xfer = 4819968.00 KB
Children see throughput for 16 re-readers = 8953522.06
KB/sec Parent sees throughput for 16 re-readers = 8930475.33 KB/sec Min throughput per process = 408033.34 KB/sec Max throughput per process = 671821.62 KB/sec Avg throughput per process = 559595.13 KB/sec Min xfer = 3186688.00 KB
Children see throughput for 16 reverse readers = 5543829.37
KB/sec Parent sees throughput for 16 reverse readers = 5425986.47 KB/sec Min throughput per process = 15684.29 KB/sec Max throughput per process = 2261884.25 KB/sec Avg throughput per process = 346489.34 KB/sec Min xfer = 36864.00 KB
Children see throughput for 16 stride readers = 16532117.19
KB/sec Parent sees throughput for 16 stride readers = 16272131.55 KB/sec Min throughput per process = 257097.92 KB/sec Max throughput per process = 2256125.75 KB/sec Avg throughput per process = 1033257.32 KB/sec Min xfer = 602112.00 KB
Children see throughput for 16 random readers = 17297437.81
KB/sec Parent sees throughput for 16 random readers = 16871312.92 KB/sec Min throughput per process = 320909.25 KB/sec Max throughput per process = 2083737.75 KB/sec Avg throughput per process = 1081089.86 KB/sec Min xfer = 826368.00 KB
Children see throughput for 16 mixed workload = 10747970.97
KB/sec Parent sees throughput for 16 mixed workload = 112898.07 KB/sec Min throughput per process = 54960.62 KB/sec Max throughput per process = 1991637.38 KB/sec Avg throughput per process = 671748.19 KB/sec Min xfer = 145408.00 KB
Children see throughput for 16 random writers = 358103.29
KB/sec Parent sees throughput for 16 random writers = 166805.09 KB/sec Min throughput per process = 21263.60 KB/sec Max throughput per process = 22942.70 KB/sec Avg throughput per process = 22381.46 KB/sec Min xfer = 4859904.00 KB
Children see throughput for 16 pwrite writers = 325666.64
KB/sec Parent sees throughput for 16 pwrite writers = 177771.50 KB/sec Min throughput per process = 19902.90 KB/sec Max throughput per process = 20863.29 KB/sec Avg throughput per process = 20354.17 KB/sec Min xfer = 5008384.00 KB
Children see throughput for 16 pread readers = 445021.47
KB/sec Parent sees throughput for 16 pread readers = 444618.25 KB/sec Min throughput per process = 26932.47 KB/sec Max throughput per process = 28361.61 KB/sec Avg throughput per process = 27813.84 KB/sec Min xfer = 4981760.00 KB
"Throughput report Y-axis is type of test X-axis is number of processes" "Record size = 1024 Kbytes " "Output is in Kbytes/sec"
" Initial write " 349500.55
" Rewrite " 372333.90
" Read " 440115.98
" Re-read " 8953522.06
" Reverse Read " 5543829.37
" Stride read " 16532117.19
" Random read " 17297437.81
" Mixed workload " 10747970.97
" Random write " 358103.29
" Pwrite " 325666.64
" Pread " 445021.47
Regards,
Dan
On 5/19/12 4:48 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
You're now approaching storage write saturation for your box on writes at that rate.
Pull reads now.
Sent from my iPhone
On May 19, 2012, at 2:43 PM, Dan Burkland dburklan@NMDP.ORG wrote:
Here are the IOZone results:
Run began: Sat May 19 16:22:46 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b /root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 371306.91 KB/sec Parent sees throughput for 16 initial writers = 167971.82 KB/sec Min throughput per process = 21901.84 KB/sec Max throughput per process = 25333.62 KB/sec Avg throughput per process = 23206.68 KB/sec Min xfer = 4533248.00 KB
Children see throughput for 16 rewriters = 350486.11 KB/sec Parent sees throughput for 16 rewriters = 176947.47 KB/sec Min throughput per process = 21154.26 KB/sec Max throughput per process = 23011.69 KB/sec Avg throughput per process = 21905.38 KB/sec Min xfer = 4819968.00 KB
362MB/s looks quite a bit higher however can somebody validate that I am reading these results correctly? Should I also run "iozone" with the -a (auto) option for good measure?
Thanks again for all of your responses, I greatly appreciate it!
Dan
On 5/19/12 4:36 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
Jeff Mother - Which specific setting are you referring to?
I installed iozone on my test machine and am currently running the following iozone command on it:
iozone -s 5g -r 1m -t 16 -R -b /root/iozone_testserver_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
I'll post the results once it is finished
Dan
On 5/19/12 2:44 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG wrote:
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 tel:4096%20262144 16777216 net.ipv4.tcp_wmem = 4096 262144 tel:4096%20262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2, s ec = sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
> Your block size is only 1K; try increasing the block size and the > throughput will increase. 1K IOs would generate a lot of IOPs with > very > little throughput. > > -Robert > > Sent from my iPhone > > On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote: > >> Hi all, >> >> My company just bought some Intel x520 10GbE cards which I >>recently >> installed into our Oracle EBS database servers (IBM 3850 X5s >>running >> RHEL >> 5.8). As the "linux guy" I have been tasked with getting these >> servers >> to >> communicate with our NetApp 6080s via NFS over the new 10GbE >>links. I >> have >> got everything working however ever after tuning the RHEL kernel I >>am >> only >> getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile >> bs=1024 >> count=5242880" command. For you folks that run 10GbE to your >> toasters, >> what write speeds are you seeing from your 10GbE connected >>servers? >> Did >> you have to do any tuning in order to get the best results >>possible? >> If >> so >> what did you change? >> >> Thanks! >> >> Dan >> >> >> >> _______________________________________________ >> Toasters mailing list >> Toasters@teaparty.net >> http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
-- --- Gustatus Similis Pullus
Would have to see perfstat (statit and a few other things) to know..but you are much more in line with reality...you're square in the ballpark.
On Sun, May 20, 2012 at 3:53 PM, Dan Burkland dburklan@nmdp.org wrote:
Here are the results with Direct I/O enabled:
Run began: Sun May 20 16:21:12 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled O_DIRECT feature enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b
/root/iozone_mn4s31063_2012-05-d.csv -I -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 262467.29 KB/sec Parent sees throughput for 16 initial writers = 260324.76 KB/sec Min throughput per process = 16309.72 KB/sec Max throughput per process = 16546.15 KB/sec Avg throughput per process = 16404.21 KB/sec Min xfer = 5168128.00 KB Children see throughput for 16 rewriters = 251104.65 KB/sec Parent sees throughput for 16 rewriters = 251090.95 KB/sec Min throughput per process = 15546.73 KB/sec Max throughput per process = 15832.99 KB/sec Avg throughput per process = 15694.04 KB/sec Min xfer = 5148672.00 KB Children see throughput for 16 readers = 619751.30 KB/sec Parent sees throughput for 16 readers = 619581.97 KB/sec Min throughput per process = 36595.70 KB/sec Max throughput per process = 39467.45 KB/sec Avg throughput per process = 38734.46 KB/sec Min xfer = 4861952.00 KB Children see throughput for 16 re-readers = 626421.73 KB/sec Parent sees throughput for 16 re-readers = 626354.38 KB/sec Min throughput per process = 37853.47 KB/sec Max throughput per process = 40021.52 KB/sec Avg throughput per process = 39151.36 KB/sec Min xfer = 4959232.00 KB Children see throughput for 16 reverse readers = 462712.64 KB/sec Parent sees throughput for 16 reverse readers = 462649.29 KB/sec Min throughput per process = 27713.84 KB/sec Max throughput per process = 29794.67 KB/sec Avg throughput per process = 28919.54 KB/sec Min xfer = 4877312.00 KB Children see throughput for 16 stride readers = 520482.83 KB/sec Parent sees throughput for 16 stride readers = 520448.31 KB/sec Min throughput per process = 31892.69 KB/sec Max throughput per process = 33016.53 KB/sec Avg throughput per process = 32530.18 KB/sec Min xfer = 5064704.00 KB Children see throughput for 16 random readers = 544089.98 KB/sec Parent sees throughput for 16 random readers = 544055.32 KB/sec Min throughput per process = 33799.79 KB/sec Max throughput per process = 34304.76 KB/sec Avg throughput per process = 34005.62 KB/sec Min xfer = 5166080.00 KB Children see throughput for 16 mixed workload = 365865.06 KB/sec Parent sees throughput for 16 mixed workload = 352394.93 KB/sec Min throughput per process = 22250.01 KB/sec Max throughput per process = 23576.78 KB/sec Avg throughput per process = 22866.57 KB/sec Min xfer = 4947968.00 KB Children see throughput for 16 random writers = 230192.41 KB/sec Parent sees throughput for 16 random writers = 229237.34 KB/sec Min throughput per process = 14307.92 KB/sec Max throughput per process = 14463.50 KB/sec Avg throughput per process = 14387.03 KB/sec Min xfer = 5186560.00 KB Children see throughput for 16 pwrite writers = 197020.59 KB/sec Parent sees throughput for 16 pwrite writers = 195973.16 KB/sec Min throughput per process = 12265.62 KB/sec Max throughput per process = 12394.86 KB/sec Avg throughput per process = 12313.79 KB/sec Min xfer = 5188608.00 KB Children see throughput for 16 pread readers = 578525.04 KB/sec Parent sees throughput for 16 pread readers = 578418.73 KB/sec Min throughput per process = 33046.61 KB/sec Max throughput per process = 38253.89 KB/sec Avg throughput per process = 36157.82 KB/sec Min xfer = 4530176.00 KB
"Throughput report Y-axis is type of test X-axis is number of processes" "Record size = 1024 Kbytes " "Output is in Kbytes/sec"
" Initial write " 262467.29
" Rewrite " 251104.65
" Read " 619751.30
" Re-read " 626421.73
" Reverse Read " 462712.64
" Stride read " 520482.83
" Random read " 544089.98
" Mixed workload " 365865.06
" Random write " 230192.41
" Pwrite " 197020.59
" Pread " 578525.04
The read results definitely look more believable now. Are these results more in line with what I should be seeing? Tomorrow I am going to try and rule the switches out of the equation by running "netperf" between my two 10GbE test systems.
Dan
From: Jeff Mohler speedtoys.racing@gmail.com Date: Sun, 20 May 2012 16:17:41 -0500 To: Dan Burkland dburklan@nmdp.org Cc: "toasters@teaparty.net" toasters@teaparty.net Subject: Re: Poor NFS 10GbE performance on NetApp 6080s
Re-read is from:
Host system cache Netapp system cache (or pam)
Direct will bypass host caching..yup.
On Sun, May 20, 2012 at 2:12 PM, Dan Burkland dburklan@nmdp.org wrote:
In regards to the latest "iozone" results, are these more in the ball park of what I should be seeing? Also why is the re-read throughput value roughly 20x that of the initial read speed? Would this be caching on the NFS client side or some sort of caching done by the PAM card on the 6080? (Should I be running these tests with the "-I" or "Direct IO" argument to bypass any possible local caching mechanisms?"
Thanks again!
Dan
On 5/19/12 5:32 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
I unmounted the NFS share and rebooted the box before running the same "iozone" command again. This time I let "iozone" run through all of its test (including the read-based ones)
Run began: Sat May 19 16:46:27 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b
/root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 349500.55
KB/sec Parent sees throughput for 16 initial writers = 173837.26 KB/sec Min throughput per process = 21147.24 KB/sec Max throughput per process = 22701.06 KB/sec Avg throughput per process = 21843.78 KB/sec Min xfer = 4884480.00 KB
Children see throughput for 16 rewriters = 372333.90
KB/sec Parent sees throughput for 16 rewriters = 179256.38 KB/sec Min throughput per process = 22495.20 KB/sec Max throughput per process = 24418.89 KB/sec Avg throughput per process = 23270.87 KB/sec Min xfer = 4830208.00 KB
Children see throughput for 16 readers = 440115.98
KB/sec Parent sees throughput for 16 readers = 439993.44 KB/sec Min throughput per process = 26406.17 KB/sec Max throughput per process = 28724.05 KB/sec Avg throughput per process = 27507.25 KB/sec Min xfer = 4819968.00 KB
Children see throughput for 16 re-readers = 8953522.06
KB/sec Parent sees throughput for 16 re-readers = 8930475.33 KB/sec Min throughput per process = 408033.34 KB/sec Max throughput per process = 671821.62 KB/sec Avg throughput per process = 559595.13 KB/sec Min xfer = 3186688.00 KB
Children see throughput for 16 reverse readers = 5543829.37
KB/sec Parent sees throughput for 16 reverse readers = 5425986.47 KB/sec Min throughput per process = 15684.29 KB/sec Max throughput per process = 2261884.25 KB/sec Avg throughput per process = 346489.34 KB/sec Min xfer = 36864.00 KB
Children see throughput for 16 stride readers = 16532117.19
KB/sec Parent sees throughput for 16 stride readers = 16272131.55 KB/sec Min throughput per process = 257097.92 KB/sec Max throughput per process = 2256125.75 KB/sec Avg throughput per process = 1033257.32 KB/sec Min xfer = 602112.00 KB
Children see throughput for 16 random readers = 17297437.81
KB/sec Parent sees throughput for 16 random readers = 16871312.92 KB/sec Min throughput per process = 320909.25 KB/sec Max throughput per process = 2083737.75 KB/sec Avg throughput per process = 1081089.86 KB/sec Min xfer = 826368.00 KB
Children see throughput for 16 mixed workload = 10747970.97
KB/sec Parent sees throughput for 16 mixed workload = 112898.07 KB/sec Min throughput per process = 54960.62 KB/sec Max throughput per process = 1991637.38 KB/sec Avg throughput per process = 671748.19 KB/sec Min xfer = 145408.00 KB
Children see throughput for 16 random writers = 358103.29
KB/sec Parent sees throughput for 16 random writers = 166805.09 KB/sec Min throughput per process = 21263.60 KB/sec Max throughput per process = 22942.70 KB/sec Avg throughput per process = 22381.46 KB/sec Min xfer = 4859904.00 KB
Children see throughput for 16 pwrite writers = 325666.64
KB/sec Parent sees throughput for 16 pwrite writers = 177771.50 KB/sec Min throughput per process = 19902.90 KB/sec Max throughput per process = 20863.29 KB/sec Avg throughput per process = 20354.17 KB/sec Min xfer = 5008384.00 KB
Children see throughput for 16 pread readers = 445021.47
KB/sec Parent sees throughput for 16 pread readers = 444618.25 KB/sec Min throughput per process = 26932.47 KB/sec Max throughput per process = 28361.61 KB/sec Avg throughput per process = 27813.84 KB/sec Min xfer = 4981760.00 KB
"Throughput report Y-axis is type of test X-axis is number of processes" "Record size = 1024 Kbytes " "Output is in Kbytes/sec"
" Initial write " 349500.55
" Rewrite " 372333.90
" Read " 440115.98
" Re-read " 8953522.06
" Reverse Read " 5543829.37
" Stride read " 16532117.19
" Random read " 17297437.81
" Mixed workload " 10747970.97
" Random write " 358103.29
" Pwrite " 325666.64
" Pread " 445021.47
Regards,
Dan
On 5/19/12 4:48 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
You're now approaching storage write saturation for your box on writes at that rate.
Pull reads now.
Sent from my iPhone
On May 19, 2012, at 2:43 PM, Dan Burkland dburklan@NMDP.ORG wrote:
Here are the IOZone results:
Run began: Sat May 19 16:22:46 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b /root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 371306.91 KB/sec Parent sees throughput for 16 initial writers = 167971.82 KB/sec Min throughput per process = 21901.84 KB/sec Max throughput per process = 25333.62 KB/sec Avg throughput per process = 23206.68 KB/sec Min xfer = 4533248.00 KB
Children see throughput for 16 rewriters = 350486.11 KB/sec Parent sees throughput for 16 rewriters = 176947.47 KB/sec Min throughput per process = 21154.26 KB/sec Max throughput per process = 23011.69 KB/sec Avg throughput per process = 21905.38 KB/sec Min xfer = 4819968.00 KB
362MB/s looks quite a bit higher however can somebody validate that I am reading these results correctly? Should I also run "iozone" with the -a (auto) option for good measure?
Thanks again for all of your responses, I greatly appreciate it!
Dan
On 5/19/12 4:36 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
Jeff Mother - Which specific setting are you referring to?
I installed iozone on my test machine and am currently running the following iozone command on it:
iozone -s 5g -r 1m -t 16 -R -b /root/iozone_testserver_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
I'll post the results once it is finished
Dan
On 5/19/12 2:44 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG
wrote:
> I know dd isn't the best tool since it is a single threaded >application > and in no way represents the workload that Oracle will impose. >However, > I > thought it would still give me a decent ballpark figure regarding > throughput. I tried a block size of 64k, 128k, and 1M (just to see) >and > got a bit more promising results: > > # dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s > > If I run two of these dd sessions at once the throughput figure >above > gets > cut in half (each dd session reports it creates the file at around > 100MB/s). > > As far as the switch goes, I have not checked it yet however I did > notice > that flow control is set to full on the 6080 10GbE interfaces. We >are > also > running Jumbo Frames on all of the involved equipment. > > As far as the RHEL OS tweaks go, here are the settings that I have > changed > on the system: > > ### > /etc/sysctl.conf: > > # 10GbE Kernel Parameters > net.core.rmem_default = 262144 > net.core.rmem_max = 16777216 > net.core.wmem_default = 262144 > net.core.wmem_max = 16777216 > net.ipv4.tcp_rmem = 4096 262144 tel:4096%20262144 16777216 > net.ipv4.tcp_wmem = 4096 262144 tel:4096%20262144 16777216 > net.ipv4.tcp_window_scaling = 1 > net.ipv4.tcp_syncookies = 0 > net.ipv4.tcp_timestamps = 0 > net.ipv4.tcp_sack = 0 > # > > ### > > ### > /etc/modprobe.d/sunrpc.conf: > > > options sunrpc tcp_slot_table_entries=128 > > ### > > > ### > Mount options for the NetApp test NFS share: > > > >rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2, >s >ec > = > sy > s > > ### > > Thanks again for all of your quick and detailed responses! > > > Dan > > > > On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote: > >> Your block size is only 1K; try increasing the block size and the >> throughput will increase. 1K IOs would generate a lot of IOPs with >> very >> little throughput. >> >> -Robert >> >> Sent from my iPhone >> >> On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote: >> >>> Hi all, >>> >>> My company just bought some Intel x520 10GbE cards which I >>>recently >>> installed into our Oracle EBS database servers (IBM 3850 X5s >>>running >>> RHEL >>> 5.8). As the "linux guy" I have been tasked with getting these >>> servers >>> to >>> communicate with our NetApp 6080s via NFS over the new 10GbE >>>links. I >>> have >>> got everything working however ever after tuning the RHEL kernel I >>>am >>> only >>> getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile >>> bs=1024 >>> count=5242880" command. For you folks that run 10GbE to your >>> toasters, >>> what write speeds are you seeing from your 10GbE connected >>>servers? >>> Did >>> you have to do any tuning in order to get the best results >>>possible? >>> If >>> so >>> what did you change? >>> >>> Thanks! >>> >>> Dan >>> >>> >>> >>> _______________________________________________ >>> Toasters mailing list >>> Toasters@teaparty.net >>> http://www.teaparty.net/mailman/listinfo/toasters > > > _______________________________________________ > Toasters mailing list > Toasters@teaparty.net > http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
--
Gustatus Similis Pullus
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Check your client-side CPU usage. Dalvenjah's earlier mail mentioned it, but at those rates you're largely just testing single-stream TCP throughput and I'd suspect you're choked on interrupt handlers on the client.
If you want to test this, add a second IP to your filer and mount via that, with a workload generator going against each -- the multiple transports should move your numbers up.
This is one place where Oracle's DirectNFS will really help -- by opening a RPC transport per process, you not only avoid static slot allocation (while you're waiting on RHEL 6.3), but also get a healthy number of flows to feed all the interrupt vectors on a MSI-X capable NIC.
[sent from my mobile]
On May 20, 2012, at 3:53 PM, Dan Burkland dburklan@NMDP.ORG wrote:
Here are the results with Direct I/O enabled:
Run began: Sun May 20 16:21:12 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled O_DIRECT feature enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b
/root/iozone_mn4s31063_2012-05-d.csv -I -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 262467.29 KB/sec Parent sees throughput for 16 initial writers = 260324.76 KB/sec Min throughput per process = 16309.72 KB/sec Max throughput per process = 16546.15 KB/sec Avg throughput per process = 16404.21 KB/sec Min xfer = 5168128.00 KB Children see throughput for 16 rewriters = 251104.65 KB/sec Parent sees throughput for 16 rewriters = 251090.95 KB/sec Min throughput per process = 15546.73 KB/sec Max throughput per process = 15832.99 KB/sec Avg throughput per process = 15694.04 KB/sec Min xfer = 5148672.00 KB Children see throughput for 16 readers = 619751.30 KB/sec Parent sees throughput for 16 readers = 619581.97 KB/sec Min throughput per process = 36595.70 KB/sec Max throughput per process = 39467.45 KB/sec Avg throughput per process = 38734.46 KB/sec Min xfer = 4861952.00 KB Children see throughput for 16 re-readers = 626421.73 KB/sec Parent sees throughput for 16 re-readers = 626354.38 KB/sec Min throughput per process = 37853.47 KB/sec Max throughput per process = 40021.52 KB/sec Avg throughput per process = 39151.36 KB/sec Min xfer = 4959232.00 KB Children see throughput for 16 reverse readers = 462712.64 KB/sec Parent sees throughput for 16 reverse readers = 462649.29 KB/sec Min throughput per process = 27713.84 KB/sec Max throughput per process = 29794.67 KB/sec Avg throughput per process = 28919.54 KB/sec Min xfer = 4877312.00 KB Children see throughput for 16 stride readers = 520482.83 KB/sec Parent sees throughput for 16 stride readers = 520448.31 KB/sec Min throughput per process = 31892.69 KB/sec Max throughput per process = 33016.53 KB/sec Avg throughput per process = 32530.18 KB/sec Min xfer = 5064704.00 KB Children see throughput for 16 random readers = 544089.98 KB/sec Parent sees throughput for 16 random readers = 544055.32 KB/sec Min throughput per process = 33799.79 KB/sec Max throughput per process = 34304.76 KB/sec Avg throughput per process = 34005.62 KB/sec Min xfer = 5166080.00 KB Children see throughput for 16 mixed workload = 365865.06 KB/sec Parent sees throughput for 16 mixed workload = 352394.93 KB/sec Min throughput per process = 22250.01 KB/sec Max throughput per process = 23576.78 KB/sec Avg throughput per process = 22866.57 KB/sec Min xfer = 4947968.00 KB Children see throughput for 16 random writers = 230192.41 KB/sec Parent sees throughput for 16 random writers = 229237.34 KB/sec Min throughput per process = 14307.92 KB/sec Max throughput per process = 14463.50 KB/sec Avg throughput per process = 14387.03 KB/sec Min xfer = 5186560.00 KB Children see throughput for 16 pwrite writers = 197020.59 KB/sec Parent sees throughput for 16 pwrite writers = 195973.16 KB/sec Min throughput per process = 12265.62 KB/sec Max throughput per process = 12394.86 KB/sec Avg throughput per process = 12313.79 KB/sec Min xfer = 5188608.00 KB Children see throughput for 16 pread readers = 578525.04 KB/sec Parent sees throughput for 16 pread readers = 578418.73 KB/sec Min throughput per process = 33046.61 KB/sec Max throughput per process = 38253.89 KB/sec Avg throughput per process = 36157.82 KB/sec Min xfer = 4530176.00 KB
"Throughput report Y-axis is type of test X-axis is number of processes" "Record size = 1024 Kbytes " "Output is in Kbytes/sec"
" Initial write " 262467.29
" Rewrite " 251104.65
" Read " 619751.30
" Re-read " 626421.73
" Reverse Read " 462712.64
" Stride read " 520482.83
" Random read " 544089.98
" Mixed workload " 365865.06
" Random write " 230192.41
" Pwrite " 197020.59
" Pread " 578525.04
The read results definitely look more believable now. Are these results more in line with what I should be seeing? Tomorrow I am going to try and rule the switches out of the equation by running "netperf" between my two 10GbE test systems.
Dan
From: Jeff Mohler speedtoys.racing@gmail.com Date: Sun, 20 May 2012 16:17:41 -0500 To: Dan Burkland dburklan@nmdp.org Cc: "toasters@teaparty.net" toasters@teaparty.net Subject: Re: Poor NFS 10GbE performance on NetApp 6080s
Re-read is from:
Host system cache Netapp system cache (or pam)
Direct will bypass host caching..yup.
On Sun, May 20, 2012 at 2:12 PM, Dan Burkland dburklan@nmdp.org wrote:
In regards to the latest "iozone" results, are these more in the ball park of what I should be seeing? Also why is the re-read throughput value roughly 20x that of the initial read speed? Would this be caching on the NFS client side or some sort of caching done by the PAM card on the 6080? (Should I be running these tests with the "-I" or "Direct IO" argument to bypass any possible local caching mechanisms?"
Thanks again!
Dan
On 5/19/12 5:32 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
I unmounted the NFS share and rebooted the box before running the same "iozone" command again. This time I let "iozone" run through all of its test (including the read-based ones)
Run began: Sat May 19 16:46:27 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b
/root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 349500.55
KB/sec Parent sees throughput for 16 initial writers = 173837.26 KB/sec Min throughput per process = 21147.24 KB/sec Max throughput per process = 22701.06 KB/sec Avg throughput per process = 21843.78 KB/sec Min xfer = 4884480.00 KB
Children see throughput for 16 rewriters = 372333.90
KB/sec Parent sees throughput for 16 rewriters = 179256.38 KB/sec Min throughput per process = 22495.20 KB/sec Max throughput per process = 24418.89 KB/sec Avg throughput per process = 23270.87 KB/sec Min xfer = 4830208.00 KB
Children see throughput for 16 readers = 440115.98
KB/sec Parent sees throughput for 16 readers = 439993.44 KB/sec Min throughput per process = 26406.17 KB/sec Max throughput per process = 28724.05 KB/sec Avg throughput per process = 27507.25 KB/sec Min xfer = 4819968.00 KB
Children see throughput for 16 re-readers = 8953522.06
KB/sec Parent sees throughput for 16 re-readers = 8930475.33 KB/sec Min throughput per process = 408033.34 KB/sec Max throughput per process = 671821.62 KB/sec Avg throughput per process = 559595.13 KB/sec Min xfer = 3186688.00 KB
Children see throughput for 16 reverse readers = 5543829.37
KB/sec Parent sees throughput for 16 reverse readers = 5425986.47 KB/sec Min throughput per process = 15684.29 KB/sec Max throughput per process = 2261884.25 KB/sec Avg throughput per process = 346489.34 KB/sec Min xfer = 36864.00 KB
Children see throughput for 16 stride readers = 16532117.19
KB/sec Parent sees throughput for 16 stride readers = 16272131.55 KB/sec Min throughput per process = 257097.92 KB/sec Max throughput per process = 2256125.75 KB/sec Avg throughput per process = 1033257.32 KB/sec Min xfer = 602112.00 KB
Children see throughput for 16 random readers = 17297437.81
KB/sec Parent sees throughput for 16 random readers = 16871312.92 KB/sec Min throughput per process = 320909.25 KB/sec Max throughput per process = 2083737.75 KB/sec Avg throughput per process = 1081089.86 KB/sec Min xfer = 826368.00 KB
Children see throughput for 16 mixed workload = 10747970.97
KB/sec Parent sees throughput for 16 mixed workload = 112898.07 KB/sec Min throughput per process = 54960.62 KB/sec Max throughput per process = 1991637.38 KB/sec Avg throughput per process = 671748.19 KB/sec Min xfer = 145408.00 KB
Children see throughput for 16 random writers = 358103.29
KB/sec Parent sees throughput for 16 random writers = 166805.09 KB/sec Min throughput per process = 21263.60 KB/sec Max throughput per process = 22942.70 KB/sec Avg throughput per process = 22381.46 KB/sec Min xfer = 4859904.00 KB
Children see throughput for 16 pwrite writers = 325666.64
KB/sec Parent sees throughput for 16 pwrite writers = 177771.50 KB/sec Min throughput per process = 19902.90 KB/sec Max throughput per process = 20863.29 KB/sec Avg throughput per process = 20354.17 KB/sec Min xfer = 5008384.00 KB
Children see throughput for 16 pread readers = 445021.47
KB/sec Parent sees throughput for 16 pread readers = 444618.25 KB/sec Min throughput per process = 26932.47 KB/sec Max throughput per process = 28361.61 KB/sec Avg throughput per process = 27813.84 KB/sec Min xfer = 4981760.00 KB
"Throughput report Y-axis is type of test X-axis is number of processes" "Record size = 1024 Kbytes " "Output is in Kbytes/sec"
" Initial write " 349500.55
" Rewrite " 372333.90
" Read " 440115.98
" Re-read " 8953522.06
" Reverse Read " 5543829.37
" Stride read " 16532117.19
" Random read " 17297437.81
" Mixed workload " 10747970.97
" Random write " 358103.29
" Pwrite " 325666.64
" Pread " 445021.47
Regards,
Dan
On 5/19/12 4:48 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
You're now approaching storage write saturation for your box on writes at that rate.
Pull reads now.
Sent from my iPhone
On May 19, 2012, at 2:43 PM, Dan Burkland dburklan@NMDP.ORG wrote:
Here are the IOZone results:
Run began: Sat May 19 16:22:46 2012
File size set to 5242880 KB Record Size 1024 KB Excel chart generation enabled Command line used: iozone -s 5g -r 1m -t 16 -R -b /root/iozone_mn4s31063_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 5242880 Kbyte file in 1024 Kbyte records
Children see throughput for 16 initial writers = 371306.91 KB/sec Parent sees throughput for 16 initial writers = 167971.82 KB/sec Min throughput per process = 21901.84 KB/sec Max throughput per process = 25333.62 KB/sec Avg throughput per process = 23206.68 KB/sec Min xfer = 4533248.00 KB
Children see throughput for 16 rewriters = 350486.11 KB/sec Parent sees throughput for 16 rewriters = 176947.47 KB/sec Min throughput per process = 21154.26 KB/sec Max throughput per process = 23011.69 KB/sec Avg throughput per process = 21905.38 KB/sec Min xfer = 4819968.00 KB
362MB/s looks quite a bit higher however can somebody validate that I am reading these results correctly? Should I also run "iozone" with the -a (auto) option for good measure?
Thanks again for all of your responses, I greatly appreciate it!
Dan
On 5/19/12 4:36 PM, "Dan Burkland" dburklan@NMDP.ORG wrote:
Jeff Mother - Which specific setting are you referring to?
I installed iozone on my test machine and am currently running the following iozone command on it:
iozone -s 5g -r 1m -t 16 -R -b /root/iozone_testserver_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
I'll post the results once it is finished
Dan
On 5/19/12 2:44 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG wrote:
> I know dd isn't the best tool since it is a single threaded > application > and in no way represents the workload that Oracle will impose. > However, > I > thought it would still give me a decent ballpark figure regarding > throughput. I tried a block size of 64k, 128k, and 1M (just to see) > and > got a bit more promising results: > > # dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s > > If I run two of these dd sessions at once the throughput figure > above > gets > cut in half (each dd session reports it creates the file at around > 100MB/s). > > As far as the switch goes, I have not checked it yet however I did > notice > that flow control is set to full on the 6080 10GbE interfaces. We > are > also > running Jumbo Frames on all of the involved equipment. > > As far as the RHEL OS tweaks go, here are the settings that I have > changed > on the system: > > ### > /etc/sysctl.conf: > > # 10GbE Kernel Parameters > net.core.rmem_default = 262144 > net.core.rmem_max = 16777216 > net.core.wmem_default = 262144 > net.core.wmem_max = 16777216 > net.ipv4.tcp_rmem = 4096 262144 tel:4096%20262144 16777216 > net.ipv4.tcp_wmem = 4096 262144 tel:4096%20262144 16777216 > net.ipv4.tcp_window_scaling = 1 > net.ipv4.tcp_syncookies = 0 > net.ipv4.tcp_timestamps = 0 > net.ipv4.tcp_sack = 0 > # > > ### > > ### > /etc/modprobe.d/sunrpc.conf: > > > options sunrpc tcp_slot_table_entries=128 > > ### > > > ### > Mount options for the NetApp test NFS share: > > > > rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2, > s > ec > = > sy > s > > ### > > Thanks again for all of your quick and detailed responses! > > > Dan > > > > On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote: > >> Your block size is only 1K; try increasing the block size and the >> throughput will increase. 1K IOs would generate a lot of IOPs with >> very >> little throughput. >> >> -Robert >> >> Sent from my iPhone >> >> On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote: >> >>> Hi all, >>> >>> My company just bought some Intel x520 10GbE cards which I >>> recently >>> installed into our Oracle EBS database servers (IBM 3850 X5s >>> running >>> RHEL >>> 5.8). As the "linux guy" I have been tasked with getting these >>> servers >>> to >>> communicate with our NetApp 6080s via NFS over the new 10GbE >>> links. I >>> have >>> got everything working however ever after tuning the RHEL kernel I >>> am >>> only >>> getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile >>> bs=1024 >>> count=5242880" command. For you folks that run 10GbE to your >>> toasters, >>> what write speeds are you seeing from your 10GbE connected >>> servers? >>> Did >>> you have to do any tuning in order to get the best results >>> possible? >>> If >>> so >>> what did you change? >>> >>> Thanks! >>> >>> Dan >>> >>> >>> >>> _______________________________________________ >>> Toasters mailing list >>> Toasters@teaparty.net >>> http://www.teaparty.net/mailman/listinfo/toasters > > > _______________________________________________ > Toasters mailing list > Toasters@teaparty.net > http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
--
Gustatus Similis Pullus
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Check Andre's email here.
Sent from my iPhone
On May 19, 2012, at 2:36 PM, Dan Burkland dburklan@NMDP.ORG wrote:
Jeff Mother - Which specific setting are you referring to?
I installed iozone on my test machine and am currently running the following iozone command on it:
iozone -s 5g -r 1m -t 16 -R -b /root/iozone_testserver_2012-05-d.csv -F tf1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
I'll post the results once it is finished
Dan
On 5/19/12 2:44 PM, "Jeff Mother" speedtoys.racing@gmail.com wrote:
Easy one.
If it went down in half, adjust your kernel tcp slot count.
Sent from my iPhone
On May 19, 2012, at 11:46 AM, Dan Burkland dburklan@NMDP.ORG wrote:
I know dd isn't the best tool since it is a single threaded application and in no way represents the workload that Oracle will impose. However, I thought it would still give me a decent ballpark figure regarding throughput. I tried a block size of 64k, 128k, and 1M (just to see) and got a bit more promising results:
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5120 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.6878 seconds, 201 MB/s
If I run two of these dd sessions at once the throughput figure above gets cut in half (each dd session reports it creates the file at around 100MB/s).
As far as the switch goes, I have not checked it yet however I did notice that flow control is set to full on the 6080 10GbE interfaces. We are also running Jumbo Frames on all of the involved equipment.
As far as the RHEL OS tweaks go, here are the settings that I have changed on the system:
### /etc/sysctl.conf:
# 10GbE Kernel Parameters net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 #
###
### /etc/modprobe.d/sunrpc.conf:
options sunrpc tcp_slot_table_entries=128
###
### Mount options for the NetApp test NFS share:
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec= sy s
###
Thanks again for all of your quick and detailed responses!
Dan
On 5/19/12 1:08 PM, "Robert McDermott" rmcdermo@fhcrc.org wrote:
Your block size is only 1K; try increasing the block size and the throughput will increase. 1K IOs would generate a lot of IOPs with very little throughput.
-Robert
Sent from my iPhone
On May 19, 2012, at 10:48, Dan Burkland dburklan@NMDP.ORG wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
What happens if you run _2_ DD session??
DD/Copy/etc are not magic IO applications, they have single thread performance limits.
No reason you cant
On Sat, May 19, 2012 at 1:48 PM, Dan Burkland dburklan@nmdp.org wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Hi Dan,
Your test is a single process running a single thread. I suggest running 10 dd jobs in parallel, writing to different files. And as another guy suggested, also increase the block size, such as bs=20480. That ought to drive up the total network throughput!
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support
Saturating 10Gbe on a 6080..is a feat. :)
On Sat, May 19, 2012 at 2:17 PM, Steve Losen scl@virginia.edu wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I
have
got everything working however ever after tuning the RHEL kernel I am
only
getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile
bs=1024
count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If
so
what did you change?
Thanks!
Dan
Hi Dan,
Your test is a single process running a single thread. I suggest running 10 dd jobs in parallel, writing to different files. And as another guy suggested, also increase the block size, such as bs=20480. That ought to drive up the total network throughput!
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
BTDT :-)
980MB/sec write on a full rack of SAS drives
Am 19.05.2012 um 20:43 schrieb "Jeff Mohler" speedtoys.racing@gmail.com:
Saturating 10Gbe on a 6080..is a feat. :)
On Sat, May 19, 2012 at 2:17 PM, Steve Losen scl@virginia.edu wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Hi Dan,
Your test is a single process running a single thread. I suggest running 10 dd jobs in parallel, writing to different files. And as another guy suggested, also increase the block size, such as bs=20480. That ought to drive up the total network throughput!
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
--
Gustatus Similis Pullus _______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Don't forget that in normal circumstances on Linux, you're funneling this NFS traffic through the single RPC channel and the single TCP connection to the NetApp, even if you use multiple mount points. (I can't wait for pNFS to be finalized and fully implemented.)
I've attached some sysctl tweaks we put on our high-NFS (non-Oracle) linux systems that may work (though please test, as your mileage may vary, and not all of these may be appropriate for your environment). They may not be appropriate for an Oracle box, too, so please use caution. The changes that are probably the most safe are raising the limits and raising the sunrpc table values; you probably don't want to modify the TCP settings without consulting your DBAs or Oracle support.
Note that the last two sysctls have be done after the sunrpc kernel module loads, but before the nfs module loads in order to take effect. You might have to throw those two into an init script to get it to occur in the right order.
You also might want to do some sequential throughput tests with iozone (test 0 and test 1 and the -t flag I think) and multiple (4, 8, or more) processes with larger (4k+) block sizes; but even so, what you're seeing may be the upper end for that sort of tool.
These tweaks help a bit, but at least in terms of Oracle, we've found that:
1) Even with normal Linux NFS, Oracle spawns enough threads that in general it will get better iops and throughput than most dd (sequential) or iozone test operations. 2) Oracle seems to have enough of a mini-io-subsystem that it gets better efficiencies than an everyday command like dd operating on a mount point 3) If you can get your DBAs to look into using Oracle DirectNFS, your setup will scream; DirectNFS establishes TCP connections straight between Oracle and the NetApp, uses multiple connections, and has its own caching and IO subsystem that Oracle knows about and will benefit from. When we tested on trunked-1GB links (which I know don't add up to n times 1GB bandwidth), we ended up saturating interfaces; from what I understand DirectIO will do even better on a 10GB network. You can also use dual-networks (like dual-fabric SAN) to have Oracle load-balance properly over multiple links, instead of doing normal LACP trunking and having it only be able to push a single 10GB link's worth of bandwidth. At that point your bottleneck should be the disks behind the NetApp.
Good luck, and remember, test on a dev system first!
-dalvenjah
# NFS tweaks here # Raise generic socket memory useability, and start 'em big net.core.rmem_default=524288 net.core.wmem_default=524288 net.core.rmem_max=16777216 net.core.wmem_max=16777216 # Raise tcp memory useability too net.ipv4.tcp_rmem=4096 524288 16777216 net.ipv4.tcp_wmem=4096 524288 16777216 # raise the amount of memory for the fragmentation reassembly buffer # (if it goes above high_thresh, kernel starts tossing packets until usage # goes below low_thresh) net.ipv4.ipfrag_high_thresh=524288 net.ipv4.ipfrag_low_thresh=393216 # turn off tcp timestamps (extra CPU hit) since this is likely a # non-public server net.ipv4.tcp_timestamps=0 # make sure window scaling is on net.ipv4.tcp_window_scaling=1 # increase the number of option memory buffers net.core.optmem_max=524287 # raise the max backlog of packets on a net device net.core.netdev_max_backlog=2500 # max out the number of task request slots in the RPC code sunrpc.tcp_slot_table_entries=128 sunrpc.udp_slot_table_entries=128
On May 19, 2012, at 10:48 AM, Dan Burkland wrote:
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Note that the last two sysctls have be done after the sunrpc kernel module loads, but before the nfs module loads in order to take effect. You might have to throw those two into an init script to get it to occur in the right order.
[...]
# max out the number of task request slots in the RPC code sunrpc.tcp_slot_table_entries=128 sunrpc.udp_slot_table_entries=128
Are you sure about it? We have always used it in NFS-root environment where full NFS stack is loaded from within initrd before /etc/sysctl.conf gets chance to be processed. AFAIR it must be set before file system is mounted, as it is per-mounted filesystem parameter.
BTW we also use multiple mount points even though they physically point to the same exported volume. Although this is more relevant for high throughput environment, not for single threaded app.
If you want a simple, highly tunable IO generation tool, check out NetApp's own SIO in the NOW toolchest: http://support.netapp.com/eservice/toolchest?toolid=418 http://www.netapp.com/go/techontap/tot-march2006/0306tot_monthlytoolSIO.html
If you're suspicious of using a vendor provided tool, the source is included in the download. ;-)
Share and enjoy!
Peter
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Dan Burkland Sent: Saturday, May 19, 2012 10:48 AM To: toasters@teaparty.net Subject: Poor NFS 10GbE performance on NetApp 6080s
Hi all,
My company just bought some Intel x520 10GbE cards which I recently installed into our Oracle EBS database servers (IBM 3850 X5s running RHEL 5.8). As the "linux guy" I have been tasked with getting these servers to communicate with our NetApp 6080s via NFS over the new 10GbE links. I have got everything working however ever after tuning the RHEL kernel I am only getting 160MB/s writes using the "dd if=/dev/zero of=/mnt/testfile bs=1024 count=5242880" command. For you folks that run 10GbE to your toasters, what write speeds are you seeing from your 10GbE connected servers? Did you have to do any tuning in order to get the best results possible? If so what did you change?
Thanks!
Dan
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters