Thanks for all the feedback, this definitely appears to be a gap. This parameter wasn't intended to be required outside edge cases, but it seems that "edge cases" is way too narrow.
I have a question - what is your use of post-processing compression or deduplication?
There seems to be a few other cases where a lot of post-processing work was creating contention with snapmirror operations. Without going into too much detail, they both run as lower-priority tasks to ensure they don't interfere with "real" work like host IO operations.
If that's really the context then we need to update the KB article so nobody else ends up chasing a network or disk latency problem that doesn't exist. I'd imagine there could be other lower-priority tasks that could disproportionately mess with snapmirror transfer rates too.
-----Original Message----- From: Peter D. Gray [mailto:pdg@uow.edu.au] Sent: Monday, January 30, 2017 11:52 PM To: Steiner, Jeffrey Jeffrey.Steiner@netapp.com Cc: NGC-pdg-uow.edu.au pdg@uow.edu.au; toasters@teaparty.net Subject: Re: super secret flags
On Mon, Jan 30, 2017 at 06:13:22AM +0000, Steiner, Jeffrey wrote:
I scanned the documentation on this flag, and it's not a universally applicable setting. It should only be set in conjunction with a support case to address an identified issue. In general, it should only be set as a temporary measure, but there are exceptions to that general rule.
I am not entirely convinced that every customer should need to raise a support case to get their snapmirrors working properly.
On the whole, that issue appears to be related to transfer latency. That could be the latency of a slow network or the latency resulting from a network with a problem, such as packet loss. I'd imagine it could be also caused by latency imposed by an overloaded destination SATA aggregate as well, plus it's not out of the question that something newer like 40Gb Ethernet might create some kind of odd issue that warrants setting this flag.
Hmmm.... we have a pretty good network. And its hard to believe our disk latency at 1AM is a problem. As I said, we got a factor of 10 in terms of snapmirror performance, and no noticeable drop in filer performance at either end.
But as I said elsewhere, it should be my choice how I prioritize performance over data protection. Give me the tools and the documentation.
In normal practice, you shouldn't need to touch this parameter. I've been around a long time, and I'd never heard of it before now, and I've never used it with any of my lab setups, and I rely on SnapMirror heavily.
Did not work here.
The important thing is not to use this option unless directed by the support center. There's a risk of masking the underlying problem, or creating new problems.
Hmmmm...... you could be right. But on the other hand we spent 3 weeks of our time looking at this problem only to be told about a really simple fix that seems to work a treat.
You can see that does not make us happy.
You might consider continuing to follow up on the case to ensure that either (a) you're in an odd situation where this parameter really is warranted or (b) there is some kind of underlying problem that needs fixing. If you're otherwise happy with the way the system is performing and the parameter change worked, I'd probably call it good...
Not after 3 weeks of my time and other peoples time spent chasing a non-existant network problem. The thing that made me the most angry is that there is a completely undocumented setting that has an absolutely massive impact on performance of a major feature in ONTAP.
Basically, I posted this to see if any other people have seen the problem. It appears at least some have.
Regards, pdg