Be aware, there are issues when using ICS with older versions of snapdrive (5.0 works fine, 4.1 experienced issues).
-----Original Message----- From: Dave Barr (鬼佬) [mailto:barr@google.com] Sent: Wednesday, June 04, 2008 5:24 PM To: John Stoffel Cc: Glenn Walker; lists@up-south.com; Blake Golliher; Fox, Adam; toasters@mathworks.com Subject: Re: snapshot copy
On Wed, Jun 4, 2008 at 2:00 PM, John Stoffel john.stoffel@taec.toshiba.com wrote:
Dave> Yes, NetApp's TCP implemention leaves much to be desired over Dave> WAN links. We had many long conversations with NetApp Dave> engineering, including tracking down a major bug with window Dave> scaling.
Do you know which versions of OnTap this referred to, and did they get any fixes into more recent versions? We're running 7.0.x on our filers, with 7.2.x on our R200s. If I *know* TCP performance will be improved, I'd work to upgrade them all to something newer.
Tracking it down I'm pretty sure it's BURT 217596, first fixed in 7.2.1. AFAIK it wasn't backported to 7.0.
Dave> We found a substatial improvement when switching to ICS Dave> (i.e. use multi(src,dest)(src,dest) even with single end Dave> points). Apparently when using ICS it uses a different internal Dave> network stack/interface which is much better over the WAN.
This is interesting, I've never heard of ICS. Does this work with Snapvault? Maybe I'll need to check the latest OnTap docs to see how to do this. Or could you post an example of how you set it up?
Search the snapmirror.conf man page for "multi". Here's an example:
filer1_multi=multi(filer1,filer2)(filer1,filer2) filer1_multi:vol1 filer100:vol1 48 * * *
Dave> It still wasn't as good as our interim hack of proxying Dave> snapmirror connections via local Linux boxes (with their modern Dave> TCP stacks), but it ended up being good enough.
Now this sounds neat too. I should look into this since we have tons of fast opterons at our sites and it wouldn't be hard to setup a proxy for this. Do you have any pointers to how you did this, or what software you used?
Basically it worked by setting up virtual IPs on the proxy box, and with xinetd, setting up netcat (nc) to tunnel the connection to another box. The virtual IPs were there so we could set up a bunch of source/destination pairs, since snapmirror is hardwired to run over a single port. (You can't say to the filer, connect to "proxybox:50023", for example). Each IP on the proxy box was dedicated to serving a proxy through another proxy to a given remote filer. We did modify netcat in a small way, namely to use TCP keepalives. I may be able to send you the patch we used. It's a one liner.
What's worse is that LAN performance is great. Over the WAN I can *see* the sawtooth bandwidth of snapvaults when the TCP window scaling just screws up. Totally frustrating.
Yep that looks totally like what we saw with 217596. I suggest upgrading to 7.2.1+ or request a backport of 217596. If I recall the code change was pretty small.
--Dave