OK so curiosity got the better of me :) I just disabled this internal throttle and the lagged SnapMirrors went from about 150Mbit/s to 2.7Gbit/s according to our network monitoring tools. CPU utilization on the node that owns the disks has definitely increased, sometimes to the tune of 93% or higher, and latency across all volumes has ticked up by a small but measurable amount. Disk utilization % as measured by sysstat is still within a reasonable range. I do see a lot more CP activity, mostly :s :n and :f.
I understand why this throttle is enabled by default, and I would not keep this flag enabled because of HA CPU concerns, but the increase in SM throughput is unbelievable.
-- Ian Ehrenwald Senior Infrastructure Engineer Hachette Book Group, Inc. 1.617.263.1948 / ian.ehrenwald@hbgusa.com
On 1/30/17, 9:56 AM, "Ehrenwald, Ian" Ian.Ehrenwald@hbgusa.com wrote:
[This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
It's funny you mention this - I had a support case open not more than a couple weeks ago regarding the same exact thing.
I have a few fairly large volumes (20t, 40t) that are consistently lagged in their SM replication by over a week to our DR site. The primary and DR site are connected via at 5Gb/s with and we've been able to fill that pipe in the past. The aggregate that holds these two volumes is made of 4 x DS2246 on both the source and destination side. The destination aggregate is mostly idle, the source aggregate sees almost 20K+ IOPS 24/7/365. I also have an aggregate made of 8 x DS2246 and it's pretty busy all the time too, and volumes on that aggregate replicate to an identical aggregate at the DR site and are never lagged.
The support engineer I was working with did mention that we could disable this global throttle though it may have an impact on client latency, so I didn't do it.
The best idea we could come up with is that the source side aggregate with the lagged SM volumes, and the node that owns it (a FAS8060), might be IOPS and CPU bound and we could consider adding more shelves to this aggregate, running a reallocate to spread the blocks around, and seeing if that helps.
It's not really in my budget at the moment to purchase four more DS2246 with 24 x 1.2t each (2 for primary, 2 for DR) so this has rekindled my interest in trying this global throttle flag on a weekend were if IO bogs down nobody will complain (too much) :)
-- Ian Ehrenwald Senior Infrastructure Engineer Hachette Book Group, Inc. 1.617.263.1948 / ian.ehrenwald@hbgusa.com
On 1/29/17, 6:29 PM, "Peter D. Gray" pdg@uow.edu.au wrote:
Hi people
Just out of idle curiosity, am I the only netapp admin who does not know about the super secret flags to allow snapmirror to actually work at reasonable speed?
We were running 8.3.2 cluster mode, and spent weeks looking into why our snapmirrors to our remote site ran so slowly. We were often 2 days behind over 40G networks. Obviously, we focussed on network issues. And we wasted a lot of time. We could make no sense of the problem at all since sometimes it appears to work ok, the later the transfers slowed to a crawl.
We eventually opened a case and it did not take to long for a reply which basically said "why don't you just disable the global snapmirror throttle." I had already looked into such a beast, but found nothing.
As you may or may not know, it turns out to be a per node setting. The name of the flag is repl_throttle_enable. Of course, you can only see such flags or change them on the node, in privileged mode.
Setting the flag to 0 immediately (and I do mean immediately) allowed our snapmirrors to run at the speed you might expect over 40G. Instead of taking 2 days, snapmirror updates now took 2 hours.
We have since upgraded to 9.1. The flags reverted to on, but again can be set to off. I think there is a documented global snapmirror throttle option in 9.1, but I have not looked into that yet.
Are we the only site in the world to have seen this issue? We use snapmirror DR for all our mirrors which may be a factor.
As I said, just idle curiousity and maybe helping someone avoid the time wasting we had.
Regards, pdg
Peter GrayPh (direct): +61 2 4221 3770 Information Management & Technology ServicesPh (switch): +61 2 4221 3555 University of WollongongFax: +61 2 4229 1958 Wollongong NSW 2522Email: pdg@uow.edu.au AustraliaURL: http://pdg.uow.edu.au _______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.