Do any of our customers out there have some measurements of Sun client throughput?
I thought I could use a 2 x 400MHz Ultra Enterprise E250 with a new Sun GbE card in it as a killer client, and I find that with V3 32KB UDP packets, the sucker rolls over at
~30 MB/s
with 2 CPUs pinned at 99% in system time.
This *sucks*.
In an experimental filer setup reading from memory (all cached data) the F760 CPU is at 20%.
I am totally client bound.
Has anyone sized Sun clients out there? Are the PCI clients dogs? Is the new GbE card from Sun a loser? Should I go grab a new Alteon card for the Sun?
What is your killer high performance client?
It seems a function of the Sun E250 - 100BaseT seems to scale only to 30 MB/s before running out of CPU...
Broken-hearted in Amsterdam, beepy
In article 199906291514.IAA25644@cranford.netapp.com, Brian Pawlowski wrote:
I thought I could use a 2 x 400MHz Ultra Enterprise E250 with a new Sun GbE card in it as a killer client, and I find that with V3 32KB UDP packets, the sucker rolls over at ~30 MB/s
Count yourself lucky, I was trying to size up a large network of linux boxes all talking to a couple of netapps but gave up when I discovered how pants the linux network client it. I even offered cash for one of the kernel hackers to work on the client rather than the server and still had no offers. For our workload we can't even get 1mb/s with a linux box... Back to your problems though. The fact that both gbe and 100bt both peg out looks like it may be some sort of bus or stack problem. Are your network cards PCI 64/66? If they're only in 32bit 33MHz slots I could quite believe a 30MB/s limit. It could also be that the sun is set up with too few nfs threads or something like that.
Chris
Chris;
I've seen sustained 14MB/s across a point-to-point connection using Alteon GbEs in this configuration:
client: Dell 4200/300 (2xPII@300MHz, 512MB) kernel: 2.2.5-15 (RedHat 6.0) driver: acenic.c mount: rw,rsize=8192,wsize=8192,bg,hard,intr,udp,nfsvers=3 filer: 760, OnTap 5.3
What were you running?
rgds, tim.
Count yourself lucky, I was trying to size up a large network of linux boxes all talking to a couple of netapps but gave up when I discovered how pants the linux network client it. I even offered cash for one of the kernel hackers to work on the client rather than the server and still had no offers. For our workload we can't even get 1mb/s with a linux box...
--
I've seen sustained 14MB/s across a point-to-point connection using Alteon GbEs in this configuration:
client: Dell 4200/300 (2xPII@300MHz, 512MB) kernel: 2.2.5-15 (RedHat 6.0) driver: acenic.c mount: rw,rsize=8192,wsize=8192,bg,hard,intr,udp,nfsvers=3 filer: 760, OnTap 5.3
NFS performance on linux depends overwhelmingly on what kernel revision and whose nfs daemons you're running. While the more recent versions are beginning to show some promise, older versions -- particularly 2.0.x and earlier are notoriously slow. On 2.0.36, I've never managed more than about 1.3MB/s over switched fast ethernet with 3c905B cards with dd (udp, nfsv2 and {wr}size=8192). I can get about 8.5MB/s - 9MB/s on 2.2.5-22 with a tulip based card and the same nfs parameters, which is better, although still disappointing. This figure would probably increase with some tweaking.
Nick - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | Nick Hilliard | nick@iol.ie | | Tel: +353 1 6046800 | Advanced Systems Architect | | Fax: +353 1 6046888 | Ireland On-Line System Operations |
I ran some tests at work across the corporate switched 100bT. It's a holiday today so no LAN traffic to speak of.
The setup is basically the same mentioned earlier, only this time I'm using 100bT in the Dell, filer and my own desktop machine.
client1: Dell 4200/300 (2xPII@300MHz, 512MB) kernel1: 2.2.5-15smp (RedHat 6.0) driver1: eepro100.c
client2: Asus P2B (1xPII/333MHz, 96MB) kernel2: 2.0.37 (RedHat 5.2 + ac 2.0.37 final) driver2: 3c59x.c
filer: F760, OnTap 5.3 driver: onboard 100bT mount: rw,rsize=8192,wsize=8192,bg,hard,intr,udp,nfsvers=3 write: dd if=/dev/zero of=filer bs=8192 count=12500
Over switched ethernet I'm seeing sustained 100MB writes to the filer average 7.8MB/s for the Dell, 5.2MB/s for the Asus. Read speeds were 8MB/s for both setups out of filer cache. Given the theoretical ethernet limits, overhead etc, etc this is pretty good performance.
For grins on the Dell I tried remounting with nfsvers=2 and saw 6.5MB/s. Using the OOTB linux amd(8) automounter performance dropped to 2.3MB/s.
If there's a message here it's probably not that linux is so great or all that terrible either. There is always a strong correlation between network, NIC, NIC driver, os, memory and CPU on both ends of the wire. Any one of them out of balance can dramatically impact throughput.
Rgds, Tim.
Send email if you want more detail. I'll check out an Ultra5 + Solaris7, and an F210 and publish those as above. --
See below for a simple client test script to gage NIC+driver+os+wire+filer configuration.
setup ----- client: Asus P2B (1xPII/333MHz, 256MB) kernel: linux 2.0.37 driver: eepro100.c (intel pro/100)
filer: F760, OnTap 5.3.1 driver: internal (onboard 100bT) mount: rw,rsize=8192,wsize=8192,bg,hard,intr,udp,nfsvers=3 write: dd if=/dev/zero of=filer bs=8192 count=12800
1, then 2, then 3 then 4 simultaneous streams were launched. Time was measured independently for all streams.
results ------- total total total number thruput time filer client bytes streams MB/s seconds CPU CPU moved ------- ------- ------- --- --- -------- 1 5.9 17.7 34% 22% 104857600 2 9.2 22.8 59% 37% 209715200 3 10.4 30.2 60% 41% 314572800 4 10.8 38.8 62% 41% 419430400
script ------ #!/bin/tcsh # no other background jobs (wait function) # linux 2.0.37 # run 'sysstat 5' on filer
# tweak these set target = "/mnt/home/test" set bs = 8192 set count = 12800 set howmany = "1 2 3 4" # end of tweaks
foreach i ( $howmany ) while ($i) echo "starting $i -----" time dd if=/dev/zero \ of=$target.$i \ bs=$bs count=$count & sleep 1 @ i-- end wait echo "-----------------" sleep 7 rm -vf /mnt/home/test.* end # end of script --
Okay something is definately wrong here. I've just got a brand new u450 4x400MHz 4Gb ram. Still can't get more than 1.7mb/s off the filer. I'm getting just a tad narked off with the whole thing. Large sequential reads as I am doing now I should be getting a good 6mb/s...
Filer is a F540 256mb ram, 8mb nvram and DoT 4.3R4 network is 100mb switched, network full duplex both ends no network errors that I can see in /etc/system: set hme:hme_adv_autoneg_cap=0 set hme:hme_adv_100hdx_cap=0 set hme:hme_adv_100fdx_cap=1 set nfs:nfs3_nra = 6 set nfs:nfs3_max_threads = 16 as recommended elsewhere. Filer has "minra off" and mount options on the sun box are: netapp3:/ferret-data - /netapp/ferret-data nfs - yes soft,intr,actimeo=60,vers=3,proto=udp,rsize=32768,retrans=20,timeo=50
anyone have any ideas?
what ethernet switch are you using ?? try a crossover cable between the sparc and filer and see if situation is better, if so it is a switch issue with Full Duplex auto negotiation
Colin Johnston SA PSINET UK
On 16 Jul 1999, Chris Good wrote:
Okay something is definately wrong here. I've just got a brand new u450 4x400MHz 4Gb ram. Still can't get more than 1.7mb/s off the filer. I'm getting just a tad narked off with the whole thing. Large sequential reads as I am doing now I should be getting a good 6mb/s...
Filer is a F540 256mb ram, 8mb nvram and DoT 4.3R4 network is 100mb switched, network full duplex both ends no network errors that I can see in /etc/system: set hme:hme_adv_autoneg_cap=0 set hme:hme_adv_100hdx_cap=0 set hme:hme_adv_100fdx_cap=1 set nfs:nfs3_nra = 6 set nfs:nfs3_max_threads = 16 as recommended elsewhere. Filer has "minra off" and mount options on the sun box are: netapp3:/ferret-data - /netapp/ferret-data nfs - yes soft,intr,actimeo=60,vers=3,proto=udp,rsize=32768,retrans=20,timeo=50
anyone have any ideas?
Chris Good - Muscat Ltd. The Westbrook Centre, Milton Rd, Cambridge UK Phone: 01223 715006 Mobile: 07801 788997 http://www.muscat.com
In article slrn7oukr6.f48.chris@cecil.muscat.com, Chris Good wrote:
I've just got a brand new u450 4x400MHz 4Gb ram. Still can't get more than 1.7mb/s off the filer.
OK, I've put a load more options in /etc/system on the sun as suggested by andrew bond and others. The network checks out, everything 100mb-FD still only get around 1.5mb/s for sequential reads. While doing this filer is: 21% 261 9 0 715 1111 1014 850 0 0 7 24% 258 9 0 707 1223 1252 881 0 0 7 28% 333 8 0 708 1765 1649 861 0 0 2 25% 298 8 0 689 1483 1330 670 0 0 2 28% 296 8 0 818 1355 1822 1019 0 0 2
Interestingly if I do large numbers of random reads (32 streams of 8k random reads) I can max the filer out, around 6mb/s disk reads and 97% cpu usage. This looks to me like the network is ok but something subtly odd is going on somewhere. any smart ideas?
Chris
I don't have the answer, but am asking (begging, pleading? :) that if you find anything, _please_ share it. We've had a Sun E3000 with an ATL jukebox as our backup server for over a year now using NFS to backup 6 NetApps (varying F330 - F540) over a private 100Mb ethernet. We've been getting similar performance and haven't been able to determine why. Anything you come up with would probably be very helpful. Thanks!
In article 377EAC7D.F9A866D1@netapp.com, Timothy Moore wrote:
I've seen sustained 14MB/s across a point-to-point connection using Alteon GbEs in this configuration:
Switched 100Mb network either intel eepro or realtek 8139 network cards. kernels 2.2.9, 2.2.10 or any of the recent 2.2 kernels.
mount: rw,rsize=8192,wsize=8192,bg,hard,intr,udp,nfsvers=3
defaults or that same config only yields around 1.5Mb/s for large sequential reads. I've tried pretty much every option I can on the mount and nothing gets any better.
filer: 760, OnTap 5.3
540, 4.3 typical sysstat output: CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 26% 526 0 0 545 2175 2182 462 0 0 1 24% 526 0 0 521 2194 2021 450 0 0 1
Of course for the workload we want to use (lots of concurrent, random 8k reads) things are even worse. If I didn't know better I would say that the machines were actually on switched 10mb rather than 100mb network. hohum.
On Tue, Jun 29, 1999 at 08:14:25AM -0700, Brian Pawlowski wrote:
Do any of our customers out there have some measurements of Sun client throughput?
It blows, basically... :( Nevertheless, I think you can get higher with a few little tweaks. Still, in my applications the crappy unlink() speed will take you out longs before you run out of throughput. Or perhaps you've already made the tweaks and it still sucks...
I thought I could use a 2 x 400MHz Ultra Enterprise E250 with a new Sun GbE card in it as a killer client, and I find that with V3 32KB UDP packets, the sucker rolls over at
~30 MB/s
with 2 CPUs pinned at 99% in system time.
This *sucks*.
Well, I'm used a 4x400MHz E3000 with the Sun GB 2.0 cards - note that the 2.0 cards are MUCH better than the 1.0 stuff! I can't remember if I used the E3000 PCI adapter thingy or the "straight-up" SBus version of the Gbit card..
I'm assming you already have:
nfs.udp.xfersize 32768
On the filer.
I also addeded:
set nfs:nfs3_nra = 6 set nfs:nfs3_max_threads = 16
in /etc/system on the Sun box for a little more NFS umpfh. It sort of "artificially" inflates the load values since more threads are blocking, but it does seem to help throughput, even on 100BaseT connections.
I also run:
/usr/sbin/ndd -set /dev/tcp tcp_xmit_hiwat 32768 /usr/sbin/ndd -set /dev/tcp tcp_recv_hiwat 32768 /usr/sbin/ndd -set /dev/tcp tcp_cwnd_max 65534 /usr/sbin/ndd -set /dev/udp udp_xmit_hiwat 16384 # max. UDP PDU size for sending /usr/sbin/ndd -set /dev/udp udp_recv_hiwat 49152 # queue for UDP PDUs (3 * ICP)
To get the buffer sizes up to a more reasonable count.
Using this combo, I was able to ~40MB/s if I remember correctly.. I was only testing, so I don't have a GByte system up and running right now.. Also check your switch settings. I found quite a bit of difference on a Cat5500 depending on if you allowed auto flow-control, and such things..
I am totally client bound.
A real let-down, isn't it.. :(
Has anyone sized Sun clients out there? Are the PCI clients dogs? Is the new GbE card from Sun a loser? Should I go grab a new Alteon card for the Sun?
What is your killer high performance client?
I'd love to hear people's experience as well. I really need to do more testing with various machines (SBus vs. PCI for ex) and Gbit card vendors to see where the real problems lie.. But so far Gbit performance is slower than I would have hoped. At least it doesn't max the CPU on the filer like trunked ethernet does. :-)
-Mark
It seems a function of the Sun E250 - 100BaseT seems to scale only to 30 MB/s before running out of CPU...
Broken-hearted in Amsterdam, beepy
On Tue, 29 Jun 1999, Brian Pawlowski wrote:
I thought I could use a 2 x 400MHz Ultra Enterprise E250 with a new Sun GbE card in it as a killer client, and I find that with V3 32KB UDP packets, the sucker rolls over at
~30 MB/s
with 2 CPUs pinned at 99% in system time.
This *sucks*.
I didn't see any satisfactory answers to this the last time around, but I'm doing a bit of benchmarking for a killer tape backup server (streaming ~60MB/sec to tape). I have an older Sun E450 with 2 x 250-MHz CPU's and a Quad Fast Ethernet NIC in one of the PCI slots.
qfe0 is directly attached to an idle F740, and likewise for qfe1 to another idle F740. Please tell me I should be able to see more than an aggregate 10 MB/s doing dumps over rsh with that setup? I'm shuffling the dump streams off to /dev/null so local disk speed is not an issue. On another system, a 2 x 300-MHz E450 with two single-port NIC's, I'm able to pull about 15 MB/s total. Does the Sun QFE just suck, and I should stick to individual single-port NIC's?
I want to lead up to a 4 x 450-MHz E420R with Gigabit Ethernet, and hope I can achieve 50 MB/s or more from eight F740's. Has anyone tried this configuration?
Does somebody have an experience with how many PACKETS can be sent over specific Ether/Quad/Giga etc. interfaces ?
Eyal.
Brian Tao wrote:
On Tue, 29 Jun 1999, Brian Pawlowski wrote:
I thought I could use a 2 x 400MHz Ultra Enterprise E250 with a new Sun GbE card in it as a killer client, and I find that with V3 32KB UDP packets, the sucker rolls over at
~30 MB/s
with 2 CPUs pinned at 99% in system time.
This *sucks*.
I didn't see any satisfactory answers to this the last time
around, but I'm doing a bit of benchmarking for a killer tape backup server (streaming ~60MB/sec to tape). I have an older Sun E450 with 2 x 250-MHz CPU's and a Quad Fast Ethernet NIC in one of the PCI slots.
qfe0 is directly attached to an idle F740, and likewise for qfe1
to another idle F740. Please tell me I should be able to see more than an aggregate 10 MB/s doing dumps over rsh with that setup? I'm shuffling the dump streams off to /dev/null so local disk speed is not an issue. On another system, a 2 x 300-MHz E450 with two single-port NIC's, I'm able to pull about 15 MB/s total. Does the Sun QFE just suck, and I should stick to individual single-port NIC's?
I want to lead up to a 4 x 450-MHz E420R with Gigabit Ethernet,
and hope I can achieve 50 MB/s or more from eight F740's. Has anyone tried this configuration? -- Brian Tao (BT300, taob@risc.org) "Though this be madness, yet there is method in't"