RE: F760 cluster write performance - toasters

1 Aug 2001


      I understand now.  I would strongly suggest opening a case
with the Global Support Centers to get your case tracked,
but here are a few things you might want to consider.
You might want to run wafl_susp and find out if you 
over-running your NVRAM.  This is possible since you're
pushing LOTS of writes.  You check this by going into
rc_toggle_basic/priv set advanced mode and running
filer*> wafl_susp -z
Then run your writes
filer*> walf_susp -w
Then pan through the couple of pages for something called
cp_from_cp.  If you have lots of these compared to all
the cp's listed, you may be over-running your NVRAM or you
have some kind of delay in writing to your disk subssystem.
Remember that your NVRAM is cut in half for clusters so each
of your F760s has 16MB of NVRAM to deal with.  If this is the
case, then you may have to consider a head upgrade which has
the new NVRAM-III card with 128MB of memory.  (No they don't
work on the F760, the card and more importantly the battery
is VERY different and won't properly retro-fit into the F700
chassis).
If the delay is in your disk subsystem, this may be caused
by a couple of things:
1. Your volume is getting really full 95%+
2. Your volume was recently very full and you only added 1 disk
thus you are getting a hot disk.  statit can help determine that.
3. You're bottlenecking on the ServerNet cables.  You may want to
talk to your sales rep about evaluating the new Troika interconnects.
Rememeber your writes must go over the interconnects to the other
NVRAM to before we can ACK the write so that your write is on both
machines in case of a sudden failure.
There could certainly be other things which is why working with the
GSC is probably the best way to go.  They can work with you on the
performance commands and try to make a determination on where the
write bottleneck lives.  Clearly it doesn't appear to be on the
GigE since your reads are good.
Anyway, I hope this helps.  This about all I can think of for
this type of forum.
-- Adam Fox
NetApp Professional Services, NC
adamfox@netapp.com
...
-----Original Message-----
From: Allen Belletti [mailto:abelletti@dmotorworks.com]
Sent: Wednesday, August 01, 2001 5:44 PM
To: Fox, Adam; toasters@mathworks.com
Subject: RE: F760 cluster write performance
Ah, I'm sorry -- I used "interconnect" to refer to the 
gigabit ethernet link
between the Filers and our clients.  Our Cluster Interconnect 
is the older
ServerNet style, and works just fine (complete with 4 meters 
of cable coiled
up atop one filer.)
Allen Belletti
System Administrator
Digital Motorworks
Phone: 512-692-1024
Fax: 512-349-9366
abelletti@digitalmotorworks.com
www.digitalmotorworks.com
-----Original Message-----
From: owner-toasters@mathworks.com
[mailto:owner-toasters@mathworks.com]On Behalf Of Fox, Adam
Sent: Wednesday, August 01, 2001 10:00 AM
To: 'abelletti@digitalmotorworks.com'; toasters@mathworks.com
Subject: RE: F760 cluster write performance
Your interconnect is going through a switch?  I really
don't think that's supported.  I don't even think it's
Gbit Ethernet.  The new Cluster Interconnect cards are
fiber, but I don't think they are GigE and they are always
hooked up directly, just like the ServerNet cables were.
Are you sure you are running your interconnects through
a switch?
-- Adam Fox
NetApp Professional Services, NC
adamfox@netapp.com
...
-----Original Message-----
From: Allen Belletti [mailto:abelletti@dmotorworks.com]
Sent: Tuesday, July 31, 2001 7:39 PM
To: toasters@mathworks.com
Subject: F760 cluster write performance
Hello,
We've been running a pair of clustered F760's since about
February.  Among
other things, we are using them for Oracle database storage,
with the hosts
being Suns running Solaris 7 or 8.  The interconnect is via
gigabit Ethernet
through a Cisco 3508 switch.  We're doing NFS service only --
no CIFS.  The
volumes in question are between 60 and 80% full, and consist
of between ten
and 14 drives, either 18G or 36G depending on the volume.  OS
version is
6.1R1.  NFS mounts are using UDP, version 3, rsize=32768,
wsize=32768
...
(though Netapp has suggested reducing this to 8k).
Recently, we have been running up against the limit of the
F760's write
performance, or so it seems.  At best, a single (GigE
connected) host is
able to do 7-8MB/s of sustained sequential write traffic.
These same hosts
are able to read at greater than 30 MB/sec and sometimes as
high as 50
...
MB/sec if the data is all in cache on the Netapp.
What I'd really like to know is what kind of sustained write
rates other
folks are seeing in configurations similar to this one.  At
the very least,
if you have any kind of F760 cluster, even without gigabit
Ethernet, are you
able to do more than 7-8MB sustained write?
Also, when the writes are occurring, the filer CPU load is
generally very
high, anywhere from 80 to 100%.  Disk utilitization is more
reasonable,
around 50% if the filer is not otherwise busy.
My first thought (and Netapp's as well) would be that this is
some kind of
network problem, perhaps relating to GigE flow control.
However, if this
were the case I would expect the Filer CPU load to be lower.
If anyone has seen (or fixed!) anything like this, I would
appreciate any
suggestions or advice.
Thanks in advance,
Allen Belletti
System Administrator
Digital Motorworks
Phone: 512-692-1024
Fax: 512-349-9366
abelletti@digitalmotorworks.com
www.digitalmotorworks.com
-----Original Message-----
From: owner-toasters@mathworks.com
[mailto:owner-toasters@mathworks.com]On Behalf Of Andrew Smith
Sent: Monday, April 23, 2001 12:58 PM
To: toasters@mathworks.com
Subject: dump to remote device via rmt
Hello,
I've been dumping my filer to a remote HP DDS-4 drive on a
RedHat Linux
machine.  It's been working great.  But now I have more data
on my filer
than one DDS-4 tape will hold for a level-0 dump.
Are there any issues with spanning more than one tape for a
dump via rmt?
Here is the output of my dump run:
DUMP: Dumping tape file 1 on /dev/nst0
DUMP: creating "/vol/vol0/../snapshot_for_backup.3" snapshot.
DUMP: Using Full Volume Dump
DUMP: Date of this level 0 dump: Mon Apr 23 10:40:26 2001.
DUMP: Date of last level 0 dump: the epoch.
DUMP: Dumping /vol/vol0/ to dumper
DUMP: mapping (Pass I)[regular files]
DUMP: mapping (Pass II)[directories]
DUMP: estimated 24452615 KB.
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: Mon Apr 23 10:46:54 2001 : We have written 560012 KB.
[... lines removed ...]
DUMP: Mon Apr 23 12:37:04 2001 : We have written 21097444 KB.
DUMP: Remote write failed: RMT bad response from client
DUMP: DUMP IS ABORTED
DUMP: Deleting "/vol/vol0/../snapshot_for_backup.3" snapshot.
Is there a way I can have dump prompt me to change volumes?
I haven't
...
been able to find much information on the subject.
Thanks!
-Andrew Smith
 DCANet