Thanks for all the replies, I'm going to bring this up with engineering and ask for clearer guidance. I suspect we really need an updated KB at a minimum. If this particular problem arises because there's legitimately a lot of "extra" work going on, then this is just another tunable that needs to be documented. On the other hand, if something like post-processing dedupe is abnormally outcompeting SnapMirror, that's a bug that ought to be fixed.
It might take a week, but I'll report back on what I find.
-----Original Message----- From: Tim Parkinson [mailto:t.r.parkinson@sheffield.ac.uk] Sent: Wednesday, February 01, 2017 6:37 AM To: Steiner, Jeffrey Jeffrey.Steiner@netapp.com Cc: toasters@teaparty.net Subject: Re: super secret flags
Hi Jeffrey,
Just adding another voice to the "We've experienced abysmal snapmirror performance in cmode" crowd. We've never really had a satisfactory answer to why from our third party support people/netapp and have spent a tremendous amount of time trying to track down the cause of snapmirror issues (including buying larger controllers). This is the first we've heard of this throttle setting, and will certainly test it over a weekend to see if it helps us out, since we still see lagging mirrors and can't work out why.
We have a large number of post-process deduped volumes, no compression, to answer your question.
Regards,
Tim
On 31 January 2017 at 07:30, Steiner, Jeffrey Jeffrey.Steiner@netapp.com wrote:
Thanks for all the feedback, this definitely appears to be a gap. This parameter wasn't intended to be required outside edge cases, but it seems that "edge cases" is way too narrow.
I have a question - what is your use of post-processing compression or deduplication?
There seems to be a few other cases where a lot of post-processing work was creating contention with snapmirror operations. Without going into too much detail, they both run as lower-priority tasks to ensure they don't interfere with "real" work like host IO operations.
If that's really the context then we need to update the KB article so nobody else ends up chasing a network or disk latency problem that doesn't exist. I'd imagine there could be other lower-priority tasks that could disproportionately mess with snapmirror transfer rates too.
-----Original Message----- From: Peter D. Gray [mailto:pdg@uow.edu.au] Sent: Monday, January 30, 2017 11:52 PM To: Steiner, Jeffrey Jeffrey.Steiner@netapp.com Cc: NGC-pdg-uow.edu.au pdg@uow.edu.au; toasters@teaparty.net Subject: Re: super secret flags
On Mon, Jan 30, 2017 at 06:13:22AM +0000, Steiner, Jeffrey wrote:
I scanned the documentation on this flag, and it's not a universally applicable setting. It should only be set in conjunction with a support case to address an identified issue. In general, it should only be set as a temporary measure, but there are exceptions to that general rule.
I am not entirely convinced that every customer should need to raise a support case to get their snapmirrors working properly.
On the whole, that issue appears to be related to transfer latency. That could be the latency of a slow network or the latency resulting from a network with a problem, such as packet loss. I'd imagine it could be also caused by latency imposed by an overloaded destination SATA aggregate as well, plus it's not out of the question that something newer like 40Gb Ethernet might create some kind of odd issue that warrants setting this flag.
Hmmm.... we have a pretty good network. And its hard to believe our disk latency at 1AM is a problem. As I said, we got a factor of 10 in terms of snapmirror performance, and no noticeable drop in filer performance at either end.
But as I said elsewhere, it should be my choice how I prioritize performance over data protection. Give me the tools and the documentation.
In normal practice, you shouldn't need to touch this parameter. I've been around a long time, and I'd never heard of it before now, and I've never used it with any of my lab setups, and I rely on SnapMirror heavily.
Did not work here.
The important thing is not to use this option unless directed by the support center. There's a risk of masking the underlying problem, or creating new problems.
Hmmmm...... you could be right. But on the other hand we spent 3 weeks of our time looking at this problem only to be told about a really simple fix that seems to work a treat.
You can see that does not make us happy.
You might consider continuing to follow up on the case to ensure that either (a) you're in an odd situation where this parameter really is warranted or (b) there is some kind of underlying problem that needs fixing. If you're otherwise happy with the way the system is performing and the parameter change worked, I'd probably call it good...
Not after 3 weeks of my time and other peoples time spent chasing a non-existant network problem. The thing that made me the most angry is that there is a completely undocumented setting that has an absolutely massive impact on performance of a major feature in ONTAP.
Basically, I posted this to see if any other people have seen the problem. It appears at least some have.
Regards, pdg
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
-- Tim Parkinson Server & Storage Administrator University of Sheffield 0114 222 3039