Re: super secret flags

2 Feb 2017


      Hi Jeffrey and others,
I don't want to hijack this thread, since this is specifically about
the repl_throttle_enable
flag, but are you guys aware of the performance impact on SnapMirror when
the transfers run over etherchannels with port-based hashing on the sender
side ?
I have come across this on a couple of times (First time I encountered this
I logged a case for it: 2005111796). Unforunately I have never had the time
to troubleshoot this. In case 2005111796, support observed packet loss in
the setup with port-based hashing, but we had to destroy our
(test/troubleshooting) setup before we could get to the bottom of this.
Since then, I have come across this on several occasions. More often than
not it was not a real issue since those SnapMirrors ran across WAN links,
or SnapMirror runs at night and can take all the time it wants, but on
1Gbps/10Gbps LANs where SM updates need to be fast, it is an issue.
However, I found out there is a TR that mentions that SnapMirror
performance could be impacted by port-based ifgrps so I've never bothered
to open any additional cases for this.
Can anyone else confirm this behavior ?
(To put in my two cents on the repl_throttle_enable flag: at a customer
today he reported this SnapMirror progress with/without throttling:
300GB in 2 hours vs. 100GB in 15 minutes after we disabled this flag. Also,
earlier this week I had to wait for 160 TB of vol move operations on a 2nd
line system. When disabling the repl_throttle_enable flag, I saw little or
no impact for volumes with "dead/unmodified" data on it, but a big impact
for (NFS) VMware datastores with some live VMs sitting on them: the cutover
estimation from "vol move show" was reduced by 24 hours almost immediately
- I am quite sure those VMs will have been impacted as CPU and disk load
was pegged at 90+%).
Best regards,
Filip
On Thu, Feb 2, 2017 at 9:13 AM, Steiner, Jeffrey <Jeffrey.Steiner@netapp.com
...
wrote:
...
If anyone on this distribution list runs into unexplained slow snapmirror
transfers, please open a support case and cite BURT 1030457. It sounds like
under some circumstances we don't fully understand, the throttle is too
aggressive. Post-processing deduplication jobs seem to be connected, but
there's probably more to it than just that.
I've tagged the BURT with the support cases mentioned so far in this
thread, and requested a better KB article explaining when this flag might
need to be updated.
-----Original Message-----
From: Tim Parkinson [mailto:t.r.parkinson@sheffield.ac.uk]
Sent: Wednesday, February 01, 2017 6:37 AM
To: Steiner, Jeffrey Jeffrey.Steiner@netapp.com
Cc: toasters@teaparty.net
Subject: Re: super secret flags
Hi Jeffrey,
Just adding another voice to the "We've experienced abysmal snapmirror
performance in cmode" crowd. We've never really had a satisfactory answer
to why from our third party support people/netapp and have spent a
tremendous amount of time trying to track down the cause of snapmirror
issues (including buying larger controllers). This is the first we've heard
of this throttle setting, and will certainly test it over a weekend to see
if it helps us out, since we still see lagging mirrors and can't work out
why.
We have a large number of post-process deduped volumes, no compression, to
answer your question.
Regards,
Tim
On 31 January 2017 at 07:30, Steiner, Jeffrey Jeffrey.Steiner@netapp.com
wrote:
...
Thanks for all the feedback, this definitely appears to be a gap. This
parameter wasn't intended to be required outside edge cases, but it seems
that "edge cases" is way too narrow.
...
I have a question - what is your use of post-processing compression or
deduplication?
...
There seems to be a few other cases where a lot of post-processing work
was creating contention with snapmirror operations. Without going into too
much detail, they both run as lower-priority tasks to ensure they don't
interfere with "real" work like host IO operations.
...
If that's really the context then we need to update the KB article so
nobody else ends up chasing a network or disk latency problem that doesn't
exist. I'd imagine there could be other lower-priority tasks that could
disproportionately mess with snapmirror transfer rates too.
...
-----Original Message-----
From: Peter D. Gray [mailto:pdg@uow.edu.au]
Sent: Monday, January 30, 2017 11:52 PM
To: Steiner, Jeffrey Jeffrey.Steiner@netapp.com
Cc: NGC-pdg-uow.edu.au pdg@uow.edu.au; toasters@teaparty.net
Subject: Re: super secret flags
On Mon, Jan 30, 2017 at 06:13:22AM +0000, Steiner, Jeffrey wrote:
...
I scanned the documentation on this flag, and it's not a universally
applicable setting. It should only be set in conjunction with a support
case to address an identified issue. In general, it should only be set as a
temporary measure, but there are exceptions to that general rule.
...
...
I am not entirely convinced that every customer should need to raise a
support case to get their snapmirrors working properly.
...
...
On the whole, that issue appears to be related to transfer latency.
That could be the latency of a slow network or the latency resulting from a
network with a problem, such as packet loss. I'd imagine it could be also
caused by latency imposed by an overloaded destination SATA aggregate as
well, plus it's not out of the question that something newer like 40Gb
Ethernet might create some kind of odd issue that warrants setting this
flag.
...
...
Hmmm.... we have a pretty good network. And its hard to believe our disk
latency at 1AM is a problem. As I said, we got a factor of 10 in terms of
snapmirror performance, and no noticeable drop in filer performance at
either end.
...
But as I said elsewhere, it should be my choice how I prioritize
performance over data protection. Give me the tools and the documentation.
...
...
In normal practice, you shouldn't need to touch this parameter. I've
been around a long time, and I'd never heard of it before now, and I've
never used it with any of my lab setups, and I rely on SnapMirror heavily.
...
Did not work here.
...
The important thing is not to use this option unless directed by the
support center. There's a risk of masking the underlying problem, or
creating new problems.
...
Hmmmm...... you could be right. But on the other hand we spent 3 weeks
of our time looking at this problem only to be told about a really simple
fix that seems to work a treat.
...
You can see that does not make us happy.
...
You might consider continuing to follow up on the case to ensure that
either (a) you're in an odd situation where this parameter really is
warranted or (b) there is some kind of underlying problem that needs
fixing. If you're otherwise happy with the way the system is performing
and the parameter change worked, I'd probably call it good...
...
Not after 3 weeks of my time and other peoples time spent chasing a
non-existant network problem.
...
The thing that made me the most angry is that there is a completely
undocumented setting that has an absolutely massive impact on performance
of a major feature in ONTAP.
...
Basically, I posted this to see if any other people have seen the
problem.
...
It appears at least some have.
Regards,
pdg

Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
--
Tim Parkinson
Server & Storage Administrator
University of Sheffield
0114 222 3039

Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: super secret flags