Hi all,
Does anyone else here do SnapVault across the WAN across a long wide link (dual T3, 90+ msec delay) and get decent performance, esp for the initial copy of the volume(s) and/or qtree(s)?
We've gone the route where I did an lrep_reader dump of a 3.8tb qtree (don't ask...) to local disks. Then I used a nice tool called 'bbcp' to push it all across the WAN so that I could actually *use* all my bandwidth.
Regular SnapVault sucks for performance, it just can't push more than 15mb/s which stinks when you have dual T-3 (90mb/s) bandwidth. Using bbcp I can fill that pipe for days on end.
So then once it's all across the WAN, I then used lrep_writer to dump the data to it's destination qtree. All fine and good, but then I have to start a regular snapvault to now catchup with all the data written during the 8+ days the first stage took. Sigh...
Anyone know if I can lrep_reader/lrep_writer the next snapvault snapshot across as well, so I can push the data faster? Because each time the link goes down between the sites, I have to start all over again with the catchup snapvault transfer and it's killing me.
Also, I'd love to be able to *know* how much data is left to copy from the source to the destination, but "snapvault status -l ...." doesn't give that sort of information. Or do I need to start using the 'snap delta' command and my own math to figure out things?
Thanks, John John Stoffel - Senior Staff Systems Administrator - System LSI Group Toshiba America Electronic Components, Inc. - http://www.toshiba.com/taec john.stoffel@taec.toshiba.com - 508-486-1087
John, We have used lrep repeatedly to push out a snapmirror copy of our tools repository (2T and growing) to 8 sites so far and more on the way. Not using it for SV but the premise is the same.
We break it up and then compress and write the files to 400G PC SATA drives. We ship the drives to the new location, copy and uncompress the files back up to a contiguous area and then use the writer to rebuild it. We have not shipped those files across the WAN.... would take too long. We are generally out of sync for 3 weeks by the time it is all said and done and the resync takes us several days. Still, it is better than the estimated months of time it was going to take us to do this across the WAN.
As I understand it you cannot use the lrep r/w more than once unless they have come up with new tools. Also the lrep cannot die in the middle or you have to start it over (unlike the restart of an init of a snapmirror).
As to the figuring out how much data you have to move, that is the manual snap delta calculations as far as I know. C-
John Stoffel wrote:
Hi all,
Does anyone else here do SnapVault across the WAN across a long wide link (dual T3, 90+ msec delay) and get decent performance, esp for the initial copy of the volume(s) and/or qtree(s)?
We've gone the route where I did an lrep_reader dump of a 3.8tb qtree (don't ask...) to local disks. Then I used a nice tool called 'bbcp' to push it all across the WAN so that I could actually *use* all my bandwidth.
Regular SnapVault sucks for performance, it just can't push more than 15mb/s which stinks when you have dual T-3 (90mb/s) bandwidth. Using bbcp I can fill that pipe for days on end.
So then once it's all across the WAN, I then used lrep_writer to dump the data to it's destination qtree. All fine and good, but then I have to start a regular snapvault to now catchup with all the data written during the 8+ days the first stage took. Sigh...
Anyone know if I can lrep_reader/lrep_writer the next snapvault snapshot across as well, so I can push the data faster? Because each time the link goes down between the sites, I have to start all over again with the catchup snapvault transfer and it's killing me.
Also, I'd love to be able to *know* how much data is left to copy from the source to the destination, but "snapvault status -l ...." doesn't give that sort of information. Or do I need to start using the 'snap delta' command and my own math to figure out things?
Thanks, John John Stoffel - Senior Staff Systems Administrator - System LSI Group Toshiba America Electronic Components, Inc. - http://www.toshiba.com/taec john.stoffel@taec.toshiba.com - 508-486-1087
Chris> We have used lrep repeatedly to push out a snapmirror copy of Chris> our tools repository (2T and growing) to 8 sites so far and Chris> more on the way. Not using it for SV but the premise is the Chris> same.
Sounds like you're running into the same issue as I am, which is that SV sucks over the WAN, but works just dandy over the LAN.
Chris> We break it up and then compress and write the files to 400G PC Chris> SATA drives. We ship the drives to the new location, copy and Chris> uncompress the files back up to a contiguous area and then use Chris> the writer to rebuild it. We have not shipped those files Chris> across the WAN.... would take too long. We are generally out Chris> of sync for 3 weeks by the time it is all said and done and the Chris> resync takes us several days. Still, it is better than the Chris> estimated months of time it was going to take us to do this Chris> across the WAN.
I'm close to three weeks out of sync now (672+ hours) and I'm just not going to be happy when the next updates takes 24+ hours to bring upto speed. It's just pathetic.
Chris> As I understand it you cannot use the lrep r/w more than once Chris> unless they have come up with new tools. Also the lrep cannot Chris> die in the middle or you have to start it over (unlike the Chris> restart of an init of a snapmirror).
I wish they would let me lrep_ multiple times from the master down to the various sub-snapshots. Then I could push the data across faster.
Chris> As to the figuring out how much data you have to move, that is Chris> the manual snap delta calculations as far as I know.
Unfortunately, there doesn't seem to be any sort of reality between the snapvault status -l report and the snapvault delta report, since the first is on a qtree basis and the second is only a volume basis. And the delta report seems to be off by a couple of orders of magnitude, unless it's really showing Mb of difference, not Kb.
Dunno... it's frustrating.
Also, someone else wrote me telling me to use a WAN accelerator, such as those by Riverbed. I messed up in my initial post. We already *are* using WAN accelerators. Helps, but not alot. I can still use 'bbcp' to burn all my WAN bandwidth, while SV won't come anywhere close, accelerator or not.
And I know if I do the same SV across the LAN to the R200 next to my FAS960, I could do it much much much faster, which tells me it's all NetApp's fault due to lack of TCP tuning.
If they'd only let me up the TCP window size to something like 4mb on each end, I'd hope to get better performance. Or something.
Thanks, John
Chris> John Stoffel wrote:
Hi all,
Does anyone else here do SnapVault across the WAN across a long wide link (dual T3, 90+ msec delay) and get decent performance, esp for the initial copy of the volume(s) and/or qtree(s)?
We've gone the route where I did an lrep_reader dump of a 3.8tb qtree (don't ask...) to local disks. Then I used a nice tool called 'bbcp' to push it all across the WAN so that I could actually *use* all my bandwidth.
Regular SnapVault sucks for performance, it just can't push more than 15mb/s which stinks when you have dual T-3 (90mb/s) bandwidth. Using bbcp I can fill that pipe for days on end.
So then once it's all across the WAN, I then used lrep_writer to dump the data to it's destination qtree. All fine and good, but then I have to start a regular snapvault to now catchup with all the data written during the 8+ days the first stage took. Sigh...
Anyone know if I can lrep_reader/lrep_writer the next snapvault snapshot across as well, so I can push the data faster? Because each time the link goes down between the sites, I have to start all over again with the catchup snapvault transfer and it's killing me.
Also, I'd love to be able to *know* how much data is left to copy from the source to the destination, but "snapvault status -l ...." doesn't give that sort of information. Or do I need to start using the 'snap delta' command and my own math to figure out things?
Thanks, John John Stoffel - Senior Staff Systems Administrator - System LSI Group Toshiba America Electronic Components, Inc. - http://www.toshiba.com/taec john.stoffel@taec.toshiba.com - 508-486-1087
Chris> --
Chris> ----------------------------------------------------------------------------- Chris> * Chris Blackmor _______ | There's only three things * Chris> * Advanced Micro Devices ____ | | thats for sure..... * Chris> * Phone: (512) 602-1608 /| | | | taxes, death, and trouble * Chris> * Fax: (512) 602-5155 | |___| | | Rickie Lee Jones * Chris> * Email: chris.blackmor@amd.com |____/ | | "Trouble Man" * Chris> ----------------------------------------------------------------------------- Chris> * My comments are mine, and mine alone. * Chris> -----------------------------------------------------------------------------
John Stoffel wrote:
Chris> We have used lrep repeatedly to push out a snapmirror copy of Chris> our tools repository (2T and growing) to 8 sites so far and Chris> more on the way. Not using it for SV but the premise is the Chris> same.
Sounds like you're running into the same issue as I am, which is that SV sucks over the WAN, but works just dandy over the LAN.
Unfortunately, yes. We could throw more money at the WAN but we aren't Cisco.
Chris> We break it up and then compress and write the files to 400G PC Chris> SATA drives. We ship the drives to the new location, copy and Chris> uncompress the files back up to a contiguous area and then use Chris> the writer to rebuild it. We have not shipped those files Chris> across the WAN.... would take too long. We are generally out Chris> of sync for 3 weeks by the time it is all said and done and the Chris> resync takes us several days. Still, it is better than the Chris> estimated months of time it was going to take us to do this Chris> across the WAN.
I'm close to three weeks out of sync now (672+ hours) and I'm just not going to be happy when the next updates takes 24+ hours to bring upto speed. It's just pathetic.
After doing this so many times (the ones I quoted were just one repo we have had to use lrep to initialize) I have found that the first resync takes some time but after the first one I keep forcing new updates and eventually we get in sync and able to keep things up to date with the scheduled runs. This all depends on the rate of data change. Since this is a tools repo we just have new additions - rarely are we *allowed* to delete anything. Your mileage will vary.
Chris> As I understand it you cannot use the lrep r/w more than once Chris> unless they have come up with new tools. Also the lrep cannot Chris> die in the middle or you have to start it over (unlike the Chris> restart of an init of a snapmirror).
I wish they would let me lrep_ multiple times from the master down to the various sub-snapshots. Then I could push the data across faster.
In our case, by the time we got the next set of disks loaded it wouldn't make a big difference doing it via a secondary lrep or by the WAN resync. Your needs are probably different.
Chris> As to the figuring out how much data you have to move, that is Chris> the manual snap delta calculations as far as I know.
Unfortunately, there doesn't seem to be any sort of reality between the snapvault status -l report and the snapvault delta report, since the first is on a qtree basis and the second is only a volume basis. And the delta report seems to be off by a couple of orders of magnitude, unless it's really showing Mb of difference, not Kb.
Yes... I have a BIG problem with this. I know I can kinda guess how much data has to be transferred and updated but there is no real way. Even if you pull lrep out of the mix snap delta doesn't give you a decent idea if you have more than one qtree in a volume. I sure wish it did.
Dunno... it's frustrating.
Agreed.
Also, someone else wrote me telling me to use a WAN accelerator, such as those by Riverbed. I messed up in my initial post. We already *are* using WAN accelerators. Helps, but not alot. I can still use 'bbcp' to burn all my WAN bandwidth, while SV won't come anywhere close, accelerator or not.
We use them too but there is only so much you can push thru a pipe. You could get a bigger pipe but it'll cost ya.
And I know if I do the same SV across the LAN to the R200 next to my FAS960, I could do it much much much faster, which tells me it's all NetApp's fault due to lack of TCP tuning.
There is some you can do with window sizing but that can have negative affects on your local backups. Tread carefully.
If they'd only let me up the TCP window size to something like 4mb on each end, I'd hope to get better performance. Or something.
Good luck! If you find some cool new tool or tuning parameters I am all ears. C-
Thanks, John
Chris> John Stoffel wrote:
Hi all,
Does anyone else here do SnapVault across the WAN across a long wide link (dual T3, 90+ msec delay) and get decent performance, esp for the initial copy of the volume(s) and/or qtree(s)?
We've gone the route where I did an lrep_reader dump of a 3.8tb qtree (don't ask...) to local disks. Then I used a nice tool called 'bbcp' to push it all across the WAN so that I could actually *use* all my bandwidth.
Regular SnapVault sucks for performance, it just can't push more than 15mb/s which stinks when you have dual T-3 (90mb/s) bandwidth. Using bbcp I can fill that pipe for days on end.
So then once it's all across the WAN, I then used lrep_writer to dump the data to it's destination qtree. All fine and good, but then I have to start a regular snapvault to now catchup with all the data written during the 8+ days the first stage took. Sigh...
Anyone know if I can lrep_reader/lrep_writer the next snapvault snapshot across as well, so I can push the data faster? Because each time the link goes down between the sites, I have to start all over again with the catchup snapvault transfer and it's killing me.
Also, I'd love to be able to *know* how much data is left to copy from the source to the destination, but "snapvault status -l ...." doesn't give that sort of information. Or do I need to start using the 'snap delta' command and my own math to figure out things?
Thanks, John John Stoffel - Senior Staff Systems Administrator - System LSI Group Toshiba America Electronic Components, Inc. - http://www.toshiba.com/taec john.stoffel@taec.toshiba.com - 508-486-1087
Chris> --
Chris> ----------------------------------------------------------------------------- Chris> * Chris Blackmor _______ | There's only three things * Chris> * Advanced Micro Devices ____ | | thats for sure..... * Chris> * Phone: (512) 602-1608 /| | | | taxes, death, and trouble * Chris> * Fax: (512) 602-5155 | |___| | | Rickie Lee Jones * Chris> * Email: chris.blackmor@amd.com |____/ | | "Trouble Man" * Chris> ----------------------------------------------------------------------------- Chris> * My comments are mine, and mine alone. * Chris> -----------------------------------------------------------------------------