Re: Solaris and NetApps

10 Nov 1997


      ...
One of my clients is complaining that performance between Solaris 2.5.1
systems (mainly Ultra 1s and 3000s) and an F210 is pretty damn awful. (The
210 is to be used for syslog, so the amount of disk writes isn't *that*
high).
I got him to switch to NFS v2 and UDP to get around timeout issues, but
performance still sucks.
I think someone mentioned that a Cisco Switch was involved. I don't
understand why a move to V2/UDP "gets around the timeout issues".
More data would be needed by me before I could comment on the
particular situation.
But I have a few loose comments on generally debugging a performance
problem with a 100BaseT switch involved.
I believe the on-board 100BaseT (and the single port 100BaseT PCI card)
does not support autonegotiation of full vs. half duplex.  (The quad
10/100 card does).  It will autosense I think whether it is 10 or 100
mb/s ethernet. IT is not completely clear to me what Sun has supported
in Solaris 2 regarding autonegotiation. The mildly amusing thing about
half and full duplex ethernet is that if there is a mismatch it still
sort of works. I would prefer it failed completely:-)
At this point I have seen different behaviour between our boxes and
switches from various vendors and Sun clients. After recent experiences,
if I walked into a network of Sun clients, NetApp filers and Ethernet
switches that was experiencing performance problems, I would:
1. Run some very simple sequential read and write tests.
dd(1) and mkfile(1) could be used to read and write
       files.
From an Ultra client, expect:
4+MB/s writes and 6+MB/s reads on F2xx/F3xx
7+MB/s writes and 7+MB/s reads on F5xx/F6xx
when the filers are fully configured with NVRAM and have
       13 or more disks.
(these are very rough numbers!!!! and not maximums, but sort of expected
values for a reasonable switched network).
If you are not max'ed on NVRAM and have 4 disks, then expect your
write numbers to drop. If you have 4 disks or only one fast narrow
SCSI chain, expect your read performance to drop.
If you are using TCP, expect the numbers to be lower than if you are using
UDP.
If your are using a SPARCstation 20 (and not an Ultra 1) then expect
your numbers to be much lower (I needed 3 SS20's to saturate a
100 mb/s link some point in the past). If you're running SunOS
amd not Solaris, expect your numbers to be lower. (I don't have
enough experience with other vendor's clients to tell you how
they perform).
Now you have some baseline numbers. If your numbers are significantly
lower than above rough guidelines (like 500KB/s or less) then I
would propose you have a speed or duplex mismatch somewhere.
2. Pick a path through the switch. One client (preferably
       Ultra class), one server interface, and get the switch
       port locations.
FORCE EVERYTHING TO NON-AUTONEGOTIATE 100 MB/S HALF
       DUPLEX.
On the filer, this means setting the ifconfig line
       in /etc/rc to something like:
ifconfig e10 `hostname`-e10 mediatype 100tx up
Remeasure. If the numbers don't shoot up, recheck that
       you are actually forcing everything the same.
Be careful changing port and computer duplex settings:-) You may
be kind of connection challenged if you did this through a telnet
session :-) :-) Either set it in the configuration file and reboot
(remembering to change the switch settings afterwards) or set it
from the console ports.
Don't go full duplex, don't mix and match clients and servers regarding
duplex. Just pick a path and set it to the least common denominator of
100 mb/s half-duplex. If that works well, then proceed to bring
everything to full duplex a step at a time.
The cool thing about switches is that you can actually have an
expectation of throughput numbers if the client and server are
otherwise not loaded. So in a production network you have a fighting
chance of getting reproducible numbers.
3. Once you get some data for a series of tests, you can then map
       out what works and what doesn't. Then you will likely have to
       talk to your switch vendor about upgrading firmware, or your
       computer vendor about your drivers.
There is a way to take two Solaris clients and use them to debug
NFS throughputs. Export "/tmp" from one client (set up /etc/dfs/dfstab)
and do thruput tests. If you keep your file size below the
available memory on the "server" you have a good simulation
of a fast server. Unfortunately I wouldn't extrapolate Sun
file server performance from this experiement:-) Take a look at:
www.sun.com/software/connectathon/talksched97.html
The talk "Factors Governing Thruputs". It describes some Ultra 1 to
Ultra 1 numbers I saw doing this memory-based NFS file server trick.
It has proven useful in debugging performance problems on switches,
and eliminating the filer from the equation if it gets confusing.
beepy
P.S. I gave up on SunOS (as opposed to Solaris) 100BaseT work a long
     time ago.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Solaris and NetApps