RE: snapshot cleanup and performance

16 May 2012


      We turned snaps for the aggrs off.  I think this maybe going way with some of the upcoming versions of DOT..  Mostly a waste of space, and if you did want to use aggr restore, a full restore of all the vols would occur..
________________________________
From: toasters-bounces@teaparty.net [toasters-bounces@teaparty.net] On Behalf Of Fletcher Cocquyt [fcocquyt@stanford.edu]
Sent: Tuesday, May 15, 2012 5:06 PM
To: Peter D. Gray
Cc: toasters@teaparty.net
Subject: Re: snapshot cleanup and performance
What version of Ontap ? - sounds like the bug we encountered - I wrote it up here
http://www.vmadmin.info/2010/11/vfiler-migrate-netapp-lockup.html
Bug ID 90314 Title Heavy file deletion loads can make a filer less responsive
Basically the fix was setting these hidden options to de-prioritize volume deletion related
operations (these ops had swamped the netapp during an aborted vfiler migrated (that's another related issue))
options wafl.trunc.throttle.system.max 30
options wafl.trunc.throttle.vol.max 30
options wafl.trunc.throttle.min.size 1530
options wafl.trunc.throttle.hipri.enable off
options wafl.trunc.throttle.slice.enable off
So far, we have not seen the issue again.
internal snapshot, snapmirror, dedup operations can account for a large percentage of load and IO
We had to scale our snap mirror schedule back after determining 50% of the IOPs were snap mirror related:
http://www.vmadmin.info/2010/07/vmware-and-netapp-deconstructing.html
Good luck,
Fletcher.
On May 15, 2012, at 4:30 PM, Peter D. Gray wrote:
OK, just wondering if anybody can shed light on this.
We just had a massive performance problem on one of our
netapps. One aggregate was amazingly busy, with disk drives
100% busy all the time. IO latency went through the roof
for all the volumes on the aggregate.
We spent a bit of time on this and we are not novice netapp users.
In the end we could not identify the problem. There appeared to be no
relationship between the amountof I/O coming from clients and
the amount of I/O on the aggregate.
So, we called netapp support. It took them a while (a few days)
but eventually they suggested changing the snapshot schedules
and removing snapshots off the aggregates and also removing
hourly snapshots off the volumes on the aggregate. We complied.
It took a while, but after 5 hours or so, relatively suddenly
the problem went away and seems to have stayed away for a day now.
I am assuming the snapshot cleanup was the problem and
the problem stopped when it finally caught up.
So, I guess my question is "why is snapshot cleanup so expensive".
Do blocks freed from snapshots need to be written, if so why?
It seems snapshot creation is cheap, but deletion expensive, which
makes the entire snapshot management cycle rather more trouble
then you might hope.
Regards,
pdg
_______________________________________________
Toasters mailing list
Toasters@teaparty.netmailto:Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

RE: snapshot cleanup and performance