Re: Single-file Snaprestore Causing Performance Impact?

19 Sep 2014


      Yes, a colleague started a large snaprestore (1Tb on SATA aggr) and it ended up coinciding with the full backups late on the 
weekend.  The datastore became unavailable via NFS - the 3rd shift support engineer had me on the line waiting, for an hour before I suggested we just reboot.
It was another hour before I insisted we just reboot the head and service was restored on NFS - then I revovered several VMs.
I never use snaprestore personally, it is very slow - I recommend a 10g attached rsync host to recover directly from the .snapshot dir and rsync provides throughput and progress stats and can be restarted if interrupted.
This is likely a snaprestore/NFS related bug in ontap - please let me know if you get any RCA from your perfstats!
Cheers,
Fletcher.
On Sep 17, 2014, at 6:30 PM, Ray Van Dolson rvandolson@esri.com wrote:
...
That's something we're definitely keeping in mind as we put together
our own internal RCA.  This particular box *was* quite busy with the
SATA disks in question at times oversaturated.  Perhaps our snaprestore
issue would not have reared its head absent some of that
oversaturation?  It certainly could have contributed to creating
conditions where snaprestore could cause the side effects we observed.
With that said, it did not appear that snaprestore running was
introducing new "load" -- at least from a metrics standpoint.
OnCommand graphs didn't show anything different than what I'd quantify
as typical load.  We couldn't even tell visually where snaprestore
kicked in from the graphs... based on this we initially discounted that
snaprestore could be causing the problems...
Fletcher, did your issue occur on a potentially oversaturated
environment?
Thanks for all the replies.
Ray
On Thu, Sep 18, 2014 at 01:24:26AM +0000, Parisi, Justin wrote:
...
If you are rebooting the controller, you might as well core the box. That
may help in analysis of the issue.
Keep in mind that if you¹re hammering disks in a system with something
external (like NDMP) you can affect other protocols, such as CIFS and NFS.
The system has limited resources available to it, and pegging out disks,
CPU, RAM, etc can impact everyone. Perfstat would be able to verify if
you¹re pegging resources. If it¹s not a resource issue with hardware and
is a software bug, a core file would help verify that.
On 9/17/14, 8:28 PM, "Ray Van Dolson" rvandolson@esri.com wrote:
...
Hmm.  And you're on a version fairly close to ours.  For us, NFS
service actually recovered on its own -- after 30 minutes or so of
"impact".  Then it would be stable for a while and the issue would
return.  Rinse & repeat.  Rebooting the controller did expedite
recovery (though didn't prevent reocurrence).
We don't have a bug #, but did manage to capture a perfstat during one
of the outages.  We'll keep pushing on this...
Ray
On Wed, Sep 17, 2014 at 05:20:53PM -0700, Fletcher Cocquyt wrote:
...
We experienced the same NFS outage on a 2240 SATA aggr running 8.1.2.
We ended up having to reboot the filer to recover NFS service.
Is there a bug number for this issue?
We opened a case but were told without a perfstat from the incident
there was not much diagnostic info to go on.
thanks
...
On Sep 17, 2014, at 4:48 PM, Ray Van Dolson rvandolson@esri.com
wrote:
...
I'll add that this issue seems very similiar:
https://communities.netapp.com/thread/12180
Though on a much older version of ONTAP (well, presumably -- the OP
doesn't exactly state what they're running, but it is from 2010).
Ray
...
On Wed, Sep 17, 2014 at 04:04:23PM -0700, Ray Van Dolson wrote:
Thanks for the reply.  ndmpcopy is probably faster, though we've used
single-file snaprestore in the past with no issues (but hadn't used
it
...
...
since upgrading to 8.1.2P4).
It's interesting to me that no other functionality on the filer (at
least as far as we're aware) was impacted other than NFS.
We'll work with IBM to see if this is a known issue or something new.
Suppor tells us the behavior we observed is absolutely not expected.
Ray
> On Wed, Sep 17, 2014 at 08:50:44PM +0000, Jordan Slingerland wrote:
> I have heard of some issues with single file snap restore in 'older'
> version...maybe fixed in 8.2?, I am not sure. I always use ndmpcopy
> over snapstore when possible. I would suggest that as an
alternative,
...
...
> though I know that does not exactly answer your question.
> 
> 
> --Jordan
> 
> -----Original Message-----
> From: toasters-bounces@teaparty.net
[mailto:toasters-bounces@teaparty.net] On Behalf Of Ray Van Dolson
...
...
> Sent: Wednesday, September 17, 2014 4:35 PM
> To: toasters@teaparty.net
> Subject: Single-file Snaprestore Causing Performance Impact?
> 
> Hi all;
> 
> Running 8.1.2P4 in 7-Mode on an IBM N6240.  We initiated a couple of
> single-file snaprestores which ran for 15+ hours on some busy
> SATA-based aggregates).  During that time, we experienced
> intermittent issues connecting to the NFS services on this filer.
> Issues would clear up after a while (minutes or tens of minutes) and
> then return an hour or so later.
> 
> We killed the snaprestores during one of the outages and observed a
> full recovery of the NFS service.  It may have been coincidental.
> 
> Anyone aware of snaprestore (specifically, single-file restores)
> causing cascading impacts?
> 
> OnCommand doesn't show any additional spike in CPU, disk activity,
> etc....
> 
> Thanks,
> Ray

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Single-file Snaprestore Causing Performance Impact?