global snapmirror throttle wastes another 2 days

List overview All Threads
Download

newer

older

Netapp-_NFS_datastore_VMware__SQL...

What's using up my disks?

Peter D. Gray

8 Mar 2018 8 Mar '18

2:24 a.m.

Some time back I wasted 5 days trying to find out why my snapmirrors were not keeping up. Netapp eventually very kindly told me about the global snapmirror throttle which I immediately disabled on all nodes and snapmirror speeds went up by a factor of 10.

Life was sweet.

Till this week.

Again my snapmirrors could not keep up. But I knew it could not be the global snapmirror throttle because I had disabled that before.

It had to be network right? We had made some network changes.

After 2 days I decided the symptoms were suffiently similar for me to revisit the global snapmirror throttle, and yes, sure enough the settings had reverted to enabled.

I suspect this is because we power cycled our netapp heads as part of a DR exercise. It looks like when the head comes up the setting to disable the global snapmirror throttle is lost.

Great stuff.

So, its up to a total of 7 days lost because of the global snapmirror throttle which as I said before seems to exist solely for the purpose of making sure things do not work properly.

Sorry, I had to vent.

Regards, pdg

Show replies by date

Chris Hague

8 Mar 8 Mar

8:38 a.m.

Can you give us details of commands you ran to check, change and confirm this?

-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Peter D. Gray Sent: 08 March 2018 02:25 To: toasters@teaparty.net Subject: global snapmirror throttle wastes another 2 days

Life was sweet.

Till this week.

Again my snapmirrors could not keep up. But I knew it could not be the global snapmirror throttle because I had disabled that before.

It had to be network right? We had made some network changes.

After 2 days I decided the symptoms were suffiently similar for me to revisit the global snapmirror throttle, and yes, sure enough the settings had reverted to enabled.

I suspect this is because we power cycled our netapp heads as part of a DR exercise. It looks like when the head comes up the setting to disable the global snapmirror throttle is lost.

Great stuff.

So, its up to a total of 7 days lost because of the global snapmirror throttle which as I said before seems to exist solely for the purpose of making sure things do not work properly.

Sorry, I had to vent.

Regards, pdg

_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Stephen Stocke

1:09 p.m.

The original thread had the subject 'Super Secret Flags' and was active in late Jan / early Feb 2017. It was an interesting thread and worth reading. Below is the original message and one of the early replies from Jeffrey Steiner @ Netapp which is probably an appropriate place to start if you're considering any tuning...

<quote>

...

...
I scanned the documentation on this flag, and it's not a universally

applicable setting. It should only be set in conjunction with a support case to address an identified issue. In general, it should only be set as a temporary measure, but there are exceptions to that general rule.

...

...
On the whole, that issue appears to be related to transfer latency. That

could be the latency of a slow network or the latency resulting from a network with a problem, such as packet loss. I'd imagine it could be also caused by latency imposed by an overloaded destination SATA aggregate as well, plus it's not out of the question that something newer like 40Gb Ethernet might create some kind of odd issue that warrants setting this flag.

...

...
In normal practice, you shouldn't need to touch this parameter. I've

been around a long time, and I'd never heard of it before now, and I've never used it with any of my lab setups, and I rely on SnapMirror heavily.

...

...
The important thing is not to use this option unless directed by the

support center. There's a risk of masking the underlying problem, or creating new problems.

...

...
You might consider continuing to follow up on the case to ensure that

either (a) you're in an odd situation where this parameter really is warranted or (b) there is some kind of underlying problem that needs fixing. If you're otherwise happy with the way the system is performing and the parameter change worked, I'd probably call it good...

...

...
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]

On Behalf Of Peter D. Gray

...

...
Sent: Monday, January 30, 2017 12:30 AM To: toasters@teaparty.net Subject: super secret flags

Hi people

Just out of idle curiosity, am I the only netapp admin who does not know

about the super secret flags to allow snapmirror to actually work at reasonable speed?

...

...
We were running 8.3.2 cluster mode, and spent weeks looking into why our

snapmirrors to our remote site ran so slowly. We were often 2 days behind over 40G networks. Obviously, we focussed on network issues. And we wasted a lot of time. We could make no sense of the problem at all since sometimes it appears to work ok, the later the transfers slowed to a crawl.

...

...
We eventually opened a case and it did not take to long for a reply

which basically said "why don't you just disable the global snapmirror throttle."

...

...
I had already looked into such a beast, but found nothing.

As you may or may not know, it turns out to be a per node setting. The

name of the flag is repl_throttle_enable. Of course, you can only see such flags or change them on the node, in privileged mode.

...

...
Setting the flag to 0 immediately (and I do mean immediately) allowed

our snapmirrors to run at the speed you might expect over 40G. Instead of taking 2 days, snapmirror updates now took 2 hours.

...

...
We have since upgraded to 9.1. The flags reverted to on, but again can

be set to off. I think there is a documented global snapmirror throttle option in 9.1, but I have not looked into that yet.

...

...
Are we the only site in the world to have seen this issue? We use snapmirror DR for all our mirrors which may be a factor.

As I said, just idle curiousity and maybe helping someone avoid the time

wasting we had.

...

...
Regards, pdg

Peter Gray Ph (direct): +61 2 4221 3770 Information Management & Technology Services Ph (switch): +61 2 4221 3555 University of Wollongong Fax: +61 2 4229 1958 Wollongong NSW 2522 Email: pdg@uow.edu.au Australia URL: http://pdg.uow.edu.au

</quote>

On 8 March 2018 at 08:38, Chris Hague Chris_Hague@ajg.com wrote:

...

Can you give us details of commands you ran to check, change and confirm this?

-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Peter D. Gray Sent: 08 March 2018 02:25 To: toasters@teaparty.net Subject: global snapmirror throttle wastes another 2 days

Some time back I wasted 5 days trying to find out why my snapmirrors were not keeping up. Netapp eventually very kindly told me about the global snapmirror throttle which I immediately disabled on all nodes and snapmirror speeds went up by a factor of 10.

Life was sweet.

Till this week.

Again my snapmirrors could not keep up. But I knew it could not be the global snapmirror throttle because I had disabled that before.

It had to be network right? We had made some network changes.

After 2 days I decided the symptoms were suffiently similar for me to revisit the global snapmirror throttle, and yes, sure enough the settings had reverted to enabled.

I suspect this is because we power cycled our netapp heads as part of a DR exercise. It looks like when the head comes up the setting to disable the global snapmirror throttle is lost.

Great stuff.

So, its up to a total of 7 days lost because of the global snapmirror throttle which as I said before seems to exist solely for the purpose of making sure things do not work properly.

Sorry, I had to vent.

Regards, pdg

Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Steiner, Jeffrey

1:23 p.m.

I did some editing of the KB articles to hopefully make the parameter easier to find, but it will depend on the search query. If I type "snapmirror slow" the 3rd hit is a "Top 10 SnapMirror issues and solutions". This throttling isn't mentioned, unfortunately. I'll follow up on that again.

I still find it really strange that so many customers have run into a problem that required changing this value, but the majority don't. If it was truly broken I would expect nonstop support calls about snapmirror lag time. I know quite a few customers with extremely heavy SnapMirror traffic, such as high-IO databases with mirrors being updated hourly. They're not seeing problems.

I got a vibe that deduplication/compression might be a factor, but that's unscientific. If anyone has a support case open involving this parameter, send me the number so I can try to get someone to figure out what's different.

From: toasters-bounces@teaparty.net toasters-bounces@teaparty.net On Behalf Of Stephen Stocke Sent: Thursday, March 08, 2018 2:10 PM To: Chris Hague Chris_Hague@ajg.com Cc: toasters@teaparty.net Subject: Re: global snapmirror throttle wastes another 2 days

<quote>

...

...
I scanned the documentation on this flag, and it's not a universally applicable setting. It should only be set in conjunction with a support case to address an identified issue. In general, it should only be set as a temporary measure, but there are exceptions to that general rule.

On the whole, that issue appears to be related to transfer latency. That could be the latency of a slow network or the latency resulting from a network with a problem, such as packet loss. I'd imagine it could be also caused by latency imposed by an overloaded destination SATA aggregate as well, plus it's not out of the question that something newer like 40Gb Ethernet might create some kind of odd issue that warrants setting this flag.

In normal practice, you shouldn't need to touch this parameter. I've been around a long time, and I'd never heard of it before now, and I've never used it with any of my lab setups, and I rely on SnapMirror heavily.

The important thing is not to use this option unless directed by the support center. There's a risk of masking the underlying problem, or creating new problems.

You might consider continuing to follow up on the case to ensure that either (a) you're in an odd situation where this parameter really is warranted or (b) there is some kind of underlying problem that needs fixing. If you're otherwise happy with the way the system is performing and the parameter change worked, I'd probably call it good...

-----Original Message----- From: toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net] On Behalf Of Peter D. Gray Sent: Monday, January 30, 2017 12:30 AM To: toasters@teaparty.netmailto:toasters@teaparty.net Subject: super secret flags

Hi people

Just out of idle curiosity, am I the only netapp admin who does not know about the super secret flags to allow snapmirror to actually work at reasonable speed?

We were running 8.3.2 cluster mode, and spent weeks looking into why our snapmirrors to our remote site ran so slowly. We were often 2 days behind over 40G networks. Obviously, we focussed on network issues. And we wasted a lot of time. We could make no sense of the problem at all since sometimes it appears to work ok, the later the transfers slowed to a crawl.

We eventually opened a case and it did not take to long for a reply which basically said "why don't you just disable the global snapmirror throttle." I had already looked into such a beast, but found nothing.

As you may or may not know, it turns out to be a per node setting. The name of the flag is repl_throttle_enable. Of course, you can only see such flags or change them on the node, in privileged mode.

Setting the flag to 0 immediately (and I do mean immediately) allowed our snapmirrors to run at the speed you might expect over 40G. Instead of taking 2 days, snapmirror updates now took 2 hours.

We have since upgraded to 9.1. The flags reverted to on, but again can be set to off. I think there is a documented global snapmirror throttle option in 9.1, but I have not looked into that yet.

Are we the only site in the world to have seen this issue? We use snapmirror DR for all our mirrors which may be a factor.

As I said, just idle curiousity and maybe helping someone avoid the time wasting we had.

Regards, pdg

Peter Gray Ph (direct): +61 2 4221 3770 Information Management & Technology Services Ph (switch): +61 2 4221 3555 University of Wollongong Fax: +61 2 4229 1958 Wollongong NSW 2522 Email: pdg@uow.edu.aumailto:pdg@uow.edu.au Australia URL: http://pdg.uow.edu.au

</quote>

On 8 March 2018 at 08:38, Chris Hague <Chris_Hague@ajg.commailto:Chris_Hague@ajg.com> wrote: Can you give us details of commands you ran to check, change and confirm this?

-----Original Message----- From: toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net] On Behalf Of Peter D. Gray Sent: 08 March 2018 02:25 To: toasters@teaparty.netmailto:toasters@teaparty.net Subject: global snapmirror throttle wastes another 2 days

Life was sweet.

Till this week.

Again my snapmirrors could not keep up. But I knew it could not be the global snapmirror throttle because I had disabled that before.

It had to be network right? We had made some network changes.

After 2 days I decided the symptoms were suffiently similar for me to revisit the global snapmirror throttle, and yes, sure enough the settings had reverted to enabled.

I suspect this is because we power cycled our netapp heads as part of a DR exercise. It looks like when the head comes up the setting to disable the global snapmirror throttle is lost.

Great stuff.

So, its up to a total of 7 days lost because of the global snapmirror throttle which as I said before seems to exist solely for the purpose of making sure things do not work properly.

Sorry, I had to vent.

Regards, pdg

_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Steiner, Jeffrey

1:29 p.m.

Interesting, this seems to be a formally documented setting in ONTAP 9. There's a section called "Using SnapMirror global throttling"

The flags like repl_throttle_enable are never persistent across reboots as far as I know. Options would be, but not flags. I'm double checking with engineering that this flag functions exactly the same as options -option-name replication.throttle.enable. If there's anything additional to say, I'll report back.

From: toasters-bounces@teaparty.net toasters-bounces@teaparty.net On Behalf Of Steiner, Jeffrey Sent: Thursday, March 08, 2018 2:23 PM To: Stephen Stocke scstocke@gmail.com; Chris Hague Chris_Hague@ajg.com Cc: toasters@teaparty.net Subject: RE: global snapmirror throttle wastes another 2 days

From: toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net <toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net> On Behalf Of Stephen Stocke Sent: Thursday, March 08, 2018 2:10 PM To: Chris Hague <Chris_Hague@ajg.commailto:Chris_Hague@ajg.com> Cc: toasters@teaparty.netmailto:toasters@teaparty.net Subject: Re: global snapmirror throttle wastes another 2 days

<quote>

...

...
I scanned the documentation on this flag, and it's not a universally applicable setting. It should only be set in conjunction with a support case to address an identified issue. In general, it should only be set as a temporary measure, but there are exceptions to that general rule.

On the whole, that issue appears to be related to transfer latency. That could be the latency of a slow network or the latency resulting from a network with a problem, such as packet loss. I'd imagine it could be also caused by latency imposed by an overloaded destination SATA aggregate as well, plus it's not out of the question that something newer like 40Gb Ethernet might create some kind of odd issue that warrants setting this flag.

In normal practice, you shouldn't need to touch this parameter. I've been around a long time, and I'd never heard of it before now, and I've never used it with any of my lab setups, and I rely on SnapMirror heavily.

The important thing is not to use this option unless directed by the support center. There's a risk of masking the underlying problem, or creating new problems.

You might consider continuing to follow up on the case to ensure that either (a) you're in an odd situation where this parameter really is warranted or (b) there is some kind of underlying problem that needs fixing. If you're otherwise happy with the way the system is performing and the parameter change worked, I'd probably call it good...

-----Original Message----- From: toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net] On Behalf Of Peter D. Gray Sent: Monday, January 30, 2017 12:30 AM To: toasters@teaparty.netmailto:toasters@teaparty.net Subject: super secret flags

Hi people

Just out of idle curiosity, am I the only netapp admin who does not know about the super secret flags to allow snapmirror to actually work at reasonable speed?

We were running 8.3.2 cluster mode, and spent weeks looking into why our snapmirrors to our remote site ran so slowly. We were often 2 days behind over 40G networks. Obviously, we focussed on network issues. And we wasted a lot of time. We could make no sense of the problem at all since sometimes it appears to work ok, the later the transfers slowed to a crawl.

We eventually opened a case and it did not take to long for a reply which basically said "why don't you just disable the global snapmirror throttle." I had already looked into such a beast, but found nothing.

As you may or may not know, it turns out to be a per node setting. The name of the flag is repl_throttle_enable. Of course, you can only see such flags or change them on the node, in privileged mode.

Setting the flag to 0 immediately (and I do mean immediately) allowed our snapmirrors to run at the speed you might expect over 40G. Instead of taking 2 days, snapmirror updates now took 2 hours.

We have since upgraded to 9.1. The flags reverted to on, but again can be set to off. I think there is a documented global snapmirror throttle option in 9.1, but I have not looked into that yet.

Are we the only site in the world to have seen this issue? We use snapmirror DR for all our mirrors which may be a factor.

As I said, just idle curiousity and maybe helping someone avoid the time wasting we had.

Regards, pdg

Peter Gray Ph (direct): +61 2 4221 3770 Information Management & Technology Services Ph (switch): +61 2 4221 3555 University of Wollongong Fax: +61 2 4229 1958 Wollongong NSW 2522 Email: pdg@uow.edu.aumailto:pdg@uow.edu.au Australia URL: http://pdg.uow.edu.au

</quote>

On 8 March 2018 at 08:38, Chris Hague <Chris_Hague@ajg.commailto:Chris_Hague@ajg.com> wrote: Can you give us details of commands you ran to check, change and confirm this?

Life was sweet.

Till this week.

Again my snapmirrors could not keep up. But I knew it could not be the global snapmirror throttle because I had disabled that before.

It had to be network right? We had made some network changes.

After 2 days I decided the symptoms were suffiently similar for me to revisit the global snapmirror throttle, and yes, sure enough the settings had reverted to enabled.

I suspect this is because we power cycled our netapp heads as part of a DR exercise. It looks like when the head comes up the setting to disable the global snapmirror throttle is lost.

Great stuff.

So, its up to a total of 7 days lost because of the global snapmirror throttle which as I said before seems to exist solely for the purpose of making sure things do not work properly.

Sorry, I had to vent.

Regards, pdg

_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Steiner, Jeffrey

2:50 p.m.

No, they are *not* the same thing, despite the nearly identical names.

Peter - can you email me the case number privately? I'd like to use that to get a better fix for this.

From: Steiner, Jeffrey Sent: Thursday, March 08, 2018 2:29 PM To: 'Steiner, Jeffrey' Jeffrey.Steiner@netapp.com; Stephen Stocke scstocke@gmail.com; Chris Hague Chris_Hague@ajg.com Cc: toasters@teaparty.net Subject: RE: global snapmirror throttle wastes another 2 days

Interesting, this seems to be a formally documented setting in ONTAP 9. There's a section called "Using SnapMirror global throttling"

From: toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net <toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net> On Behalf Of Steiner, Jeffrey Sent: Thursday, March 08, 2018 2:23 PM To: Stephen Stocke <scstocke@gmail.commailto:scstocke@gmail.com>; Chris Hague <Chris_Hague@ajg.commailto:Chris_Hague@ajg.com> Cc: toasters@teaparty.netmailto:toasters@teaparty.net Subject: RE: global snapmirror throttle wastes another 2 days

<quote>

...

...
I scanned the documentation on this flag, and it's not a universally applicable setting. It should only be set in conjunction with a support case to address an identified issue. In general, it should only be set as a temporary measure, but there are exceptions to that general rule.

On the whole, that issue appears to be related to transfer latency. That could be the latency of a slow network or the latency resulting from a network with a problem, such as packet loss. I'd imagine it could be also caused by latency imposed by an overloaded destination SATA aggregate as well, plus it's not out of the question that something newer like 40Gb Ethernet might create some kind of odd issue that warrants setting this flag.

In normal practice, you shouldn't need to touch this parameter. I've been around a long time, and I'd never heard of it before now, and I've never used it with any of my lab setups, and I rely on SnapMirror heavily.

The important thing is not to use this option unless directed by the support center. There's a risk of masking the underlying problem, or creating new problems.

You might consider continuing to follow up on the case to ensure that either (a) you're in an odd situation where this parameter really is warranted or (b) there is some kind of underlying problem that needs fixing. If you're otherwise happy with the way the system is performing and the parameter change worked, I'd probably call it good...

-----Original Message----- From: toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net] On Behalf Of Peter D. Gray Sent: Monday, January 30, 2017 12:30 AM To: toasters@teaparty.netmailto:toasters@teaparty.net Subject: super secret flags

Hi people

Just out of idle curiosity, am I the only netapp admin who does not know about the super secret flags to allow snapmirror to actually work at reasonable speed?

We were running 8.3.2 cluster mode, and spent weeks looking into why our snapmirrors to our remote site ran so slowly. We were often 2 days behind over 40G networks. Obviously, we focussed on network issues. And we wasted a lot of time. We could make no sense of the problem at all since sometimes it appears to work ok, the later the transfers slowed to a crawl.

We eventually opened a case and it did not take to long for a reply which basically said "why don't you just disable the global snapmirror throttle." I had already looked into such a beast, but found nothing.

As you may or may not know, it turns out to be a per node setting. The name of the flag is repl_throttle_enable. Of course, you can only see such flags or change them on the node, in privileged mode.

Setting the flag to 0 immediately (and I do mean immediately) allowed our snapmirrors to run at the speed you might expect over 40G. Instead of taking 2 days, snapmirror updates now took 2 hours.

We have since upgraded to 9.1. The flags reverted to on, but again can be set to off. I think there is a documented global snapmirror throttle option in 9.1, but I have not looked into that yet.

Are we the only site in the world to have seen this issue? We use snapmirror DR for all our mirrors which may be a factor.

As I said, just idle curiousity and maybe helping someone avoid the time wasting we had.

Regards, pdg

Peter Gray Ph (direct): +61 2 4221 3770 Information Management & Technology Services Ph (switch): +61 2 4221 3555 University of Wollongong Fax: +61 2 4229 1958 Wollongong NSW 2522 Email: pdg@uow.edu.aumailto:pdg@uow.edu.au Australia URL: http://pdg.uow.edu.au

</quote>

On 8 March 2018 at 08:38, Chris Hague <Chris_Hague@ajg.commailto:Chris_Hague@ajg.com> wrote: Can you give us details of commands you ran to check, change and confirm this?

Life was sweet.

Till this week.

Again my snapmirrors could not keep up. But I knew it could not be the global snapmirror throttle because I had disabled that before.

It had to be network right? We had made some network changes.

After 2 days I decided the symptoms were suffiently similar for me to revisit the global snapmirror throttle, and yes, sure enough the settings had reverted to enabled.

I suspect this is because we power cycled our netapp heads as part of a DR exercise. It looks like when the head comes up the setting to disable the global snapmirror throttle is lost.

Great stuff.

So, its up to a total of 7 days lost because of the global snapmirror throttle which as I said before seems to exist solely for the purpose of making sure things do not work properly.

Sorry, I had to vent.

Regards, pdg

_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

2925

Age (days ago)

2925

Last active (days ago)

toasters@lists.teaparty.net

5 comments

4 participants

tags (0)

participants (4)

Chris Hague
Peter D. Gray
Steiner, Jeffrey
Stephen Stocke