That's something we're definitely keeping in mind as we put together
our own internal RCA. This particular box *was* quite busy with the
SATA disks in question at times oversaturated. Perhaps our snaprestore
issue would not have reared its head absent some of that
oversaturation? It certainly could have contributed to creating
conditions where snaprestore could cause the side effects we observed.
With that said, it did not appear that snaprestore running was
introducing new "load" -- at least from a metrics standpoint.
OnCommand graphs didn't show anything different than what I'd quantify
as typical load. We couldn't even tell visually where snaprestore
kicked in from the graphs... based on this we initially discounted that
snaprestore could be causing the problems...
Fletcher, did your issue occur on a potentially oversaturated
environment?
Thanks for all the replies.
Ray
On Thu, Sep 18, 2014 at 01:24:26AM +0000, Parisi, Justin wrote:If you are rebooting the controller, you might as well core the box. That
may help in analysis of the issue.
Keep in mind that if youıre hammering disks in a system with something
external (like NDMP) you can affect other protocols, such as CIFS and NFS.
The system has limited resources available to it, and pegging out disks,
CPU, RAM, etc can impact everyone. Perfstat would be able to verify if
youıre pegging resources. If itıs not a resource issue with hardware and
is a software bug, a core file would help verify that.
On 9/17/14, 8:28 PM, "Ray Van Dolson" <rvandolson@esri.com> wrote:Hmm. And you're on a version fairly close to ours. For us, NFS
service actually recovered on its own -- after 30 minutes or so of
"impact". Then it would be stable for a while and the issue would
return. Rinse & repeat. Rebooting the controller did expedite
recovery (though didn't prevent reocurrence).
We don't have a bug #, but did manage to capture a perfstat during one
of the outages. We'll keep pushing on this...
Ray
On Wed, Sep 17, 2014 at 05:20:53PM -0700, Fletcher Cocquyt wrote:We experienced the same NFS outage on a 2240 SATA aggr running 8.1.2.
We ended up having to reboot the filer to recover NFS service.
Is there a bug number for this issue?
We opened a case but were told without a perfstat from the incident
there was not much diagnostic info to go on.
thanksOn Sep 17, 2014, at 4:48 PM, Ray Van Dolson <rvandolson@esri.com>wrote:it
I'll add that this issue seems very similiar:
https://communities.netapp.com/thread/12180
Though on a much older version of ONTAP (well, presumably -- the OP
doesn't exactly state what they're running, but it is from 2010).
RayOn Wed, Sep 17, 2014 at 04:04:23PM -0700, Ray Van Dolson wrote:
Thanks for the reply. ndmpcopy is probably faster, though we've used
single-file snaprestore in the past with no issues (but hadn't usedalternative,since upgrading to 8.1.2P4).
It's interesting to me that no other functionality on the filer (at
least as far as we're aware) was impacted other than NFS.
We'll work with IBM to see if this is a known issue or something new.
Suppor tells us the behavior we observed is absolutely not expected.
RayOn Wed, Sep 17, 2014 at 08:50:44PM +0000, Jordan Slingerland wrote:
I have heard of some issues with single file snap restore in 'older'
version...maybe fixed in 8.2?, I am not sure. I always use ndmpcopy
over snapstore when possible. I would suggest that as an[mailto:toasters-bounces@teaparty.net] On Behalf Of Ray Van Dolsonthough I know that does not exactly answer your question.
--Jordan
-----Original Message-----
From: toasters-bounces@teaparty.netSent: Wednesday, September 17, 2014 4:35 PM
To: toasters@teaparty.net
Subject: Single-file Snaprestore Causing Performance Impact?
Hi all;
Running 8.1.2P4 in 7-Mode on an IBM N6240. We initiated a couple of
single-file snaprestores which ran for 15+ hours on some busy
SATA-based aggregates). During that time, we experienced
intermittent issues connecting to the NFS services on this filer.
Issues would clear up after a while (minutes or tens of minutes) and
then return an hour or so later.
We killed the snaprestores during one of the outages and observed a
full recovery of the NFS service. It may have been coincidental.
Anyone aware of snaprestore (specifically, single-file restores)
causing cascading impacts?
OnCommand doesn't show any additional spike in CPU, disk activity,
etc....
Thanks,
Ray