I understand now. I would strongly suggest opening a case with the Global Support Centers to get your case tracked, but here are a few things you might want to consider.
You might want to run wafl_susp and find out if you over-running your NVRAM. This is possible since you're pushing LOTS of writes. You check this by going into rc_toggle_basic/priv set advanced mode and running
filer*> wafl_susp -z
Then run your writes
filer*> walf_susp -w
Then pan through the couple of pages for something called cp_from_cp. If you have lots of these compared to all the cp's listed, you may be over-running your NVRAM or you have some kind of delay in writing to your disk subssystem. Remember that your NVRAM is cut in half for clusters so each of your F760s has 16MB of NVRAM to deal with. If this is the case, then you may have to consider a head upgrade which has the new NVRAM-III card with 128MB of memory. (No they don't work on the F760, the card and more importantly the battery is VERY different and won't properly retro-fit into the F700 chassis).
If the delay is in your disk subsystem, this may be caused by a couple of things:
1. Your volume is getting really full 95%+
2. Your volume was recently very full and you only added 1 disk thus you are getting a hot disk. statit can help determine that.
3. You're bottlenecking on the ServerNet cables. You may want to talk to your sales rep about evaluating the new Troika interconnects. Rememeber your writes must go over the interconnects to the other NVRAM to before we can ACK the write so that your write is on both machines in case of a sudden failure.
There could certainly be other things which is why working with the GSC is probably the best way to go. They can work with you on the performance commands and try to make a determination on where the write bottleneck lives. Clearly it doesn't appear to be on the GigE since your reads are good.
Anyway, I hope this helps. This about all I can think of for this type of forum.
-- Adam Fox NetApp Professional Services, NC adamfox@netapp.com
-----Original Message----- From: Allen Belletti [mailto:abelletti@dmotorworks.com] Sent: Wednesday, August 01, 2001 5:44 PM To: Fox, Adam; toasters@mathworks.com Subject: RE: F760 cluster write performance
Ah, I'm sorry -- I used "interconnect" to refer to the gigabit ethernet link between the Filers and our clients. Our Cluster Interconnect is the older ServerNet style, and works just fine (complete with 4 meters of cable coiled up atop one filer.)
Allen Belletti System Administrator Digital Motorworks Phone: 512-692-1024 Fax: 512-349-9366 abelletti@digitalmotorworks.com www.digitalmotorworks.com
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com]On Behalf Of Fox, Adam Sent: Wednesday, August 01, 2001 10:00 AM To: 'abelletti@digitalmotorworks.com'; toasters@mathworks.com Subject: RE: F760 cluster write performance
Your interconnect is going through a switch? I really don't think that's supported. I don't even think it's Gbit Ethernet. The new Cluster Interconnect cards are fiber, but I don't think they are GigE and they are always hooked up directly, just like the ServerNet cables were.
Are you sure you are running your interconnects through a switch?
-- Adam Fox NetApp Professional Services, NC adamfox@netapp.com
-----Original Message----- From: Allen Belletti [mailto:abelletti@dmotorworks.com] Sent: Tuesday, July 31, 2001 7:39 PM To: toasters@mathworks.com Subject: F760 cluster write performance
Hello,
We've been running a pair of clustered F760's since about February. Among other things, we are using them for Oracle database storage, with the hosts being Suns running Solaris 7 or 8. The interconnect is via gigabit Ethernet through a Cisco 3508 switch. We're doing NFS service only -- no CIFS. The volumes in question are between 60 and 80% full, and consist of between ten and 14 drives, either 18G or 36G depending on the volume. OS version is 6.1R1. NFS mounts are using UDP, version 3, rsize=32768,
wsize=32768
(though Netapp has suggested reducing this to 8k).
Recently, we have been running up against the limit of the F760's write performance, or so it seems. At best, a single (GigE connected) host is able to do 7-8MB/s of sustained sequential write traffic. These same hosts are able to read at greater than 30 MB/sec and sometimes as
high as 50
MB/sec if the data is all in cache on the Netapp.
What I'd really like to know is what kind of sustained write rates other folks are seeing in configurations similar to this one. At the very least, if you have any kind of F760 cluster, even without gigabit Ethernet, are you able to do more than 7-8MB sustained write?
Also, when the writes are occurring, the filer CPU load is generally very high, anywhere from 80 to 100%. Disk utilitization is more reasonable, around 50% if the filer is not otherwise busy.
My first thought (and Netapp's as well) would be that this is some kind of network problem, perhaps relating to GigE flow control. However, if this were the case I would expect the Filer CPU load to be lower.
If anyone has seen (or fixed!) anything like this, I would appreciate any suggestions or advice.
Thanks in advance, Allen Belletti System Administrator Digital Motorworks Phone: 512-692-1024 Fax: 512-349-9366 abelletti@digitalmotorworks.com www.digitalmotorworks.com
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com]On Behalf Of Andrew Smith Sent: Monday, April 23, 2001 12:58 PM To: toasters@mathworks.com Subject: dump to remote device via rmt
Hello,
I've been dumping my filer to a remote HP DDS-4 drive on a RedHat Linux machine. It's been working great. But now I have more data on my filer than one DDS-4 tape will hold for a level-0 dump.
Are there any issues with spanning more than one tape for a dump via rmt? Here is the output of my dump run:
DUMP: Dumping tape file 1 on /dev/nst0 DUMP: creating "/vol/vol0/../snapshot_for_backup.3" snapshot. DUMP: Using Full Volume Dump DUMP: Date of this level 0 dump: Mon Apr 23 10:40:26 2001. DUMP: Date of last level 0 dump: the epoch. DUMP: Dumping /vol/vol0/ to dumper DUMP: mapping (Pass I)[regular files] DUMP: mapping (Pass II)[directories] DUMP: estimated 24452615 KB. DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] DUMP: Mon Apr 23 10:46:54 2001 : We have written 560012 KB.
[... lines removed ...]
DUMP: Mon Apr 23 12:37:04 2001 : We have written 21097444 KB. DUMP: Remote write failed: RMT bad response from client DUMP: DUMP IS ABORTED DUMP: Deleting "/vol/vol0/../snapshot_for_backup.3" snapshot.
Is there a way I can have dump prompt me to change volumes?
I haven't
been able to find much information on the subject.
Thanks!
-Andrew Smith DCANet