snapshot cleanup and performance

List overview All Threads
Download

newer

older

Opinions on SnapProtect

NFS Lock and Oracle

Peter D. Gray

15 May 2012 15 May '12

11:30 p.m.

OK, just wondering if anybody can shed light on this.

We just had a massive performance problem on one of our netapps. One aggregate was amazingly busy, with disk drives 100% busy all the time. IO latency went through the roof for all the volumes on the aggregate.

We spent a bit of time on this and we are not novice netapp users. In the end we could not identify the problem. There appeared to be no relationship between the amountof I/O coming from clients and the amount of I/O on the aggregate.

So, we called netapp support. It took them a while (a few days) but eventually they suggested changing the snapshot schedules and removing snapshots off the aggregates and also removing hourly snapshots off the volumes on the aggregate. We complied.

It took a while, but after 5 hours or so, relatively suddenly the problem went away and seems to have stayed away for a day now. I am assuming the snapshot cleanup was the problem and the problem stopped when it finally caught up.

So, I guess my question is "why is snapshot cleanup so expensive".

Do blocks freed from snapshots need to be written, if so why?

It seems snapshot creation is cheap, but deletion expensive, which makes the entire snapshot management cycle rather more trouble then you might hope.

Regards, pdg

Show replies by date

Fletcher Cocquyt

16 May 16 May

12:06 a.m.

What version of Ontap ? - sounds like the bug we encountered - I wrote it up here

http://www.vmadmin.info/2010/11/vfiler-migrate-netapp-lockup.html

Bug ID 90314 Title Heavy file deletion loads can make a filer less responsive

Basically the fix was setting these hidden options to de-prioritize volume deletion related operations (these ops had swamped the netapp during an aborted vfiler migrated (that's another related issue))

options wafl.trunc.throttle.system.max 30 options wafl.trunc.throttle.vol.max 30 options wafl.trunc.throttle.min.size 1530 options wafl.trunc.throttle.hipri.enable off options wafl.trunc.throttle.slice.enable off

So far, we have not seen the issue again.

internal snapshot, snapmirror, dedup operations can account for a large percentage of load and IO We had to scale our snap mirror schedule back after determining 50% of the IOPs were snap mirror related:

http://www.vmadmin.info/2010/07/vmware-and-netapp-deconstructing.html

Good luck,

Fletcher.

On May 15, 2012, at 4:30 PM, Peter D. Gray wrote:

...

OK, just wondering if anybody can shed light on this.

We just had a massive performance problem on one of our netapps. One aggregate was amazingly busy, with disk drives 100% busy all the time. IO latency went through the roof for all the volumes on the aggregate.

We spent a bit of time on this and we are not novice netapp users. In the end we could not identify the problem. There appeared to be no relationship between the amountof I/O coming from clients and the amount of I/O on the aggregate.

So, we called netapp support. It took them a while (a few days) but eventually they suggested changing the snapshot schedules and removing snapshots off the aggregates and also removing hourly snapshots off the volumes on the aggregate. We complied.

It took a while, but after 5 hours or so, relatively suddenly the problem went away and seems to have stayed away for a day now. I am assuming the snapshot cleanup was the problem and the problem stopped when it finally caught up.

So, I guess my question is "why is snapshot cleanup so expensive".

Do blocks freed from snapshots need to be written, if so why?

It seems snapshot creation is cheap, but deletion expensive, which makes the entire snapshot management cycle rather more trouble then you might hope.

Regards, pdg

Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Klise, Steve

12:48 a.m.

We turned snaps for the aggrs off. I think this maybe going way with some of the upcoming versions of DOT.. Mostly a waste of space, and if you did want to use aggr restore, a full restore of all the vols would occur..

________________________________ From: toasters-bounces@teaparty.net [toasters-bounces@teaparty.net] On Behalf Of Fletcher Cocquyt [fcocquyt@stanford.edu] Sent: Tuesday, May 15, 2012 5:06 PM To: Peter D. Gray Cc: toasters@teaparty.net Subject: Re: snapshot cleanup and performance

What version of Ontap ? - sounds like the bug we encountered - I wrote it up here

http://www.vmadmin.info/2010/11/vfiler-migrate-netapp-lockup.html

Bug ID 90314 Title Heavy file deletion loads can make a filer less responsive

So far, we have not seen the issue again.

http://www.vmadmin.info/2010/07/vmware-and-netapp-deconstructing.html

Good luck,

Fletcher.

On May 15, 2012, at 4:30 PM, Peter D. Gray wrote:

OK, just wondering if anybody can shed light on this.

So, I guess my question is "why is snapshot cleanup so expensive".

Do blocks freed from snapshots need to be written, if so why?

It seems snapshot creation is cheap, but deletion expensive, which makes the entire snapshot management cycle rather more trouble then you might hope.

Regards, pdg

_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Jeff Mother

1:20 a.m.

SA clearing is not 'file' deletion. Close, yes, but not part if that Burt.

Sent from my iPhone

On May 15, 2012, at 5:06 PM, Fletcher Cocquyt fcocquyt@stanford.edu wrote:

...

What version of Ontap ? - sounds like the bug we encountered - I wrote it up here

http://www.vmadmin.info/2010/11/vfiler-migrate-netapp-lockup.html

Bug ID 90314 Title Heavy file deletion loads can make a filer less responsive

Basically the fix was setting these hidden options to de-prioritize volume deletion related operations (these ops had swamped the netapp during an aborted vfiler migrated (that's another related issue))

options wafl.trunc.throttle.system.max 30 options wafl.trunc.throttle.vol.max 30 options wafl.trunc.throttle.min.size 1530 options wafl.trunc.throttle.hipri.enable off options wafl.trunc.throttle.slice.enable off

So far, we have not seen the issue again.

internal snapshot, snapmirror, dedup operations can account for a large percentage of load and IO _______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

5039

Age (days ago)

5040

Last active (days ago)

toasters@lists.teaparty.net

3 comments

4 participants

tags (0)

participants (4)

Fletcher Cocquyt
Jeff Mother
Klise, Steve
Peter D. Gray