One of my clients is complaining that performance between Solaris 2.5.1 systems (mainly Ultra 1s and 3000s) and an F210 is pretty damn awful. (The 210 is to be used for syslog, so the amount of disk writes isn't *that* high).
I got him to switch to NFS v2 and UDP to get around timeout issues, but performance still sucks.
I think someone mentioned that a Cisco Switch was involved. I don't understand why a move to V2/UDP "gets around the timeout issues". More data would be needed by me before I could comment on the particular situation.
But I have a few loose comments on generally debugging a performance problem with a 100BaseT switch involved.
I believe the on-board 100BaseT (and the single port 100BaseT PCI card) does not support autonegotiation of full vs. half duplex. (The quad 10/100 card does). It will autosense I think whether it is 10 or 100 mb/s ethernet. IT is not completely clear to me what Sun has supported in Solaris 2 regarding autonegotiation. The mildly amusing thing about half and full duplex ethernet is that if there is a mismatch it still sort of works. I would prefer it failed completely:-)
At this point I have seen different behaviour between our boxes and switches from various vendors and Sun clients. After recent experiences, if I walked into a network of Sun clients, NetApp filers and Ethernet switches that was experiencing performance problems, I would:
1. Run some very simple sequential read and write tests.
dd(1) and mkfile(1) could be used to read and write files.
From an Ultra client, expect:
4+MB/s writes and 6+MB/s reads on F2xx/F3xx
7+MB/s writes and 7+MB/s reads on F5xx/F6xx
when the filers are fully configured with NVRAM and have 13 or more disks.
(these are very rough numbers!!!! and not maximums, but sort of expected values for a reasonable switched network).
If you are not max'ed on NVRAM and have 4 disks, then expect your write numbers to drop. If you have 4 disks or only one fast narrow SCSI chain, expect your read performance to drop.
If you are using TCP, expect the numbers to be lower than if you are using UDP.
If your are using a SPARCstation 20 (and not an Ultra 1) then expect your numbers to be much lower (I needed 3 SS20's to saturate a 100 mb/s link some point in the past). If you're running SunOS amd not Solaris, expect your numbers to be lower. (I don't have enough experience with other vendor's clients to tell you how they perform).
Now you have some baseline numbers. If your numbers are significantly lower than above rough guidelines (like 500KB/s or less) then I would propose you have a speed or duplex mismatch somewhere.
2. Pick a path through the switch. One client (preferably Ultra class), one server interface, and get the switch port locations.
FORCE EVERYTHING TO NON-AUTONEGOTIATE 100 MB/S HALF DUPLEX.
On the filer, this means setting the ifconfig line in /etc/rc to something like:
ifconfig e10 `hostname`-e10 mediatype 100tx up
Remeasure. If the numbers don't shoot up, recheck that you are actually forcing everything the same.
Be careful changing port and computer duplex settings:-) You may be kind of connection challenged if you did this through a telnet session :-) :-) Either set it in the configuration file and reboot (remembering to change the switch settings afterwards) or set it from the console ports.
Don't go full duplex, don't mix and match clients and servers regarding duplex. Just pick a path and set it to the least common denominator of 100 mb/s half-duplex. If that works well, then proceed to bring everything to full duplex a step at a time.
The cool thing about switches is that you can actually have an expectation of throughput numbers if the client and server are otherwise not loaded. So in a production network you have a fighting chance of getting reproducible numbers.
3. Once you get some data for a series of tests, you can then map out what works and what doesn't. Then you will likely have to talk to your switch vendor about upgrading firmware, or your computer vendor about your drivers.
There is a way to take two Solaris clients and use them to debug NFS throughputs. Export "/tmp" from one client (set up /etc/dfs/dfstab) and do thruput tests. If you keep your file size below the available memory on the "server" you have a good simulation of a fast server. Unfortunately I wouldn't extrapolate Sun file server performance from this experiement:-) Take a look at:
www.sun.com/software/connectathon/talksched97.html
The talk "Factors Governing Thruputs". It describes some Ultra 1 to Ultra 1 numbers I saw doing this memory-based NFS file server trick. It has proven useful in debugging performance problems on switches, and eliminating the filer from the equation if it gets confusing.
beepy
P.S. I gave up on SunOS (as opposed to Solaris) 100BaseT work a long time ago.