I did some editing of the KB articles to hopefully make the parameter easier to find, but it will depend on the search query. If I type "snapmirror slow" the 3rd hit is a "Top 10 SnapMirror issues and solutions". This throttling isn't mentioned, unfortunately. I'll follow up on that again.

 

I still find it really strange that so many customers have run into a problem that required changing this value, but the majority don't. If it was truly broken I would expect nonstop support calls about snapmirror lag time. I know quite a few customers with extremely heavy SnapMirror traffic, such as high-IO databases with mirrors being updated hourly. They're not seeing problems.

 

I got a vibe that deduplication/compression might be a factor, but that's unscientific. If anyone has a support case open involving this parameter, send me the number so I can try to get someone to figure out what's different.

 

From: toasters-bounces@teaparty.net <toasters-bounces@teaparty.net> On Behalf Of Stephen Stocke
Sent: Thursday, March 08, 2018 2:10 PM
To: Chris Hague <Chris_Hague@ajg.com>
Cc: toasters@teaparty.net
Subject: Re: global snapmirror throttle wastes another 2 days

 

The original thread had the subject 'Super Secret Flags' and was active in late Jan / early Feb 2017.  It was an interesting thread and worth reading.  Below is the original message and one of the early replies from Jeffrey Steiner @ Netapp which is probably an appropriate place to start if you're considering any tuning...

 

<quote>

 

>> I scanned the documentation on this flag, and it's not a universally applicable setting. It should only be set in conjunction with a support case to address an identified issue. In general, it should only be set as a temporary measure, but there are exceptions to that general rule.

>> 

>> On the whole, that issue appears to be related to transfer latency. That could be the latency of a slow network or the latency resulting from a network with a problem, such as packet loss. I'd imagine it could be also caused by latency imposed by an overloaded destination SATA aggregate as well, plus it's not out of the question that something newer like 40Gb Ethernet might create some kind of odd issue that warrants setting this flag.

>> 

>> In normal practice, you shouldn't need to touch this parameter. I've been around a long time, and I'd never heard of it before now, and I've never used it with any of my lab setups, and I rely on SnapMirror heavily.

>> 

>> The important thing is not to use this option unless directed by the support center. There's a risk of masking the underlying problem, or creating new problems.

>> 

>> You might consider continuing to follow up on the case to ensure that either (a) you're in an odd situation where this parameter really is warranted or (b) there is some kind of underlying problem that needs fixing. If you're otherwise happy with the way the system is performing and the parameter change worked, I'd probably call it good...

>> 

>> -----Original Message-----

>> From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Peter D. Gray

>> Sent: Monday, January 30, 2017 12:30 AM

>> Subject: super secret flags

>> 

>> Hi people

>> 

>> Just out of idle curiosity, am I the only netapp admin who does not know about the super secret flags to allow snapmirror to actually work at reasonable speed?

>> 

>> We were running 8.3.2 cluster mode, and spent weeks looking into why our snapmirrors to our remote site ran so slowly. We were often 2 days behind over 40G networks. Obviously, we focussed on network issues. And we wasted a lot of time. We could make no sense of the problem at all since sometimes it appears to work ok, the later the transfers slowed to a crawl.

>> 

>> We eventually opened a case and it did not take to long for a reply which basically said "why don't you just disable the global snapmirror throttle."

>> I had already looked into such a beast, but found nothing.

>> 

>> As you may or may not know, it turns out to be a per node setting. The name of the flag is repl_throttle_enable. Of course, you can only see such flags or change them on the node, in privileged mode.

>> 

>> Setting the flag to 0 immediately (and I do mean immediately) allowed our snapmirrors to run at the speed you might expect over 40G. Instead of taking 2 days, snapmirror updates now took 2 hours.

>> 

>> We have since upgraded to 9.1.  The flags reverted to on, but again can be set to off. I think there is a documented global snapmirror throttle option in 9.1, but I have not looked into that yet.

>> 

>> Are we the only site in the world to have seen this issue?

>> We use snapmirror DR for all our mirrors which may be a factor.

>> 

>> As I said, just idle curiousity and maybe helping someone avoid the time wasting we had.

>> 

>> Regards,

>> pdg

>> 

>> Peter Gray                                                                    Ph (direct): +61 2 4221 3770

>> Information Management & Technology Services        Ph (switch): +61 2 4221 3555

>> University of Wollongong                                       Fax: +61 2 4229 1958

>> Wollongong NSW 2522                                                             Email: pdg@uow.edu.au

>> Australia                                                                        URL: http://pdg.uow.edu.au

 

</quote>

 

 

On 8 March 2018 at 08:38, Chris Hague <Chris_Hague@ajg.com> wrote:

Can you give us details of commands you ran to check, change and confirm this?


-----Original Message-----
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Peter D. Gray
Sent: 08 March 2018 02:25
To: toasters@teaparty.net
Subject: global snapmirror throttle wastes another 2 days

Some time back I wasted 5 days trying to find out why my snapmirrors were not keeping up. Netapp eventually very kindly told me about the global snapmirror throttle which I immediately disabled on all nodes and snapmirror speeds went up by a factor of 10.

Life was sweet.

Till this week.

Again my snapmirrors could not keep up. But I knew it could not be the global snapmirror throttle because I had disabled that before.

It had to be network right? We had made some network changes.

After 2 days I decided the symptoms were suffiently similar for me to revisit the global snapmirror throttle, and yes, sure enough the settings had reverted to enabled.

I suspect this is because we power cycled our netapp heads as part of a DR exercise. It looks like when the head comes up the setting to disable the global snapmirror throttle is lost.

Great stuff.

So, its up to a total of 7 days lost because of the global snapmirror throttle which as I said before seems to exist solely for the purpose of making sure things do not work properly.

Sorry, I had to vent.

Regards,
pdg


_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters