Hi all,
In the last 36 hours or so we have a 19Tb aggregate that is growing above 18Tb used. Usually the aggregate used level only grows if we grow its volumes. This is different - I was forced to delete snapshots and shrink volumes to get it back under 90%. And in the last 3 hours its back above 91% - used level is climbing 5-10g/minute
I so far can not see where the growth is coming from, Aggr snapshot is OFF
Ontap 8.1.2
na02> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root, nosnap=on, raidsize=19 64-bit
thanks for any tips!
Hi Fletcher,
can you run `aggr show_space -h`, then wait 10 minutes and run it again? You should at least immediately see which volume is causing the growth if the growth is still happening.
Best,
Alexander Griesser System-Administrator
ANEXIA Internetdienstleistungs GmbH
Telefon: +43-5-0556-320 Telefax: +43-5-0556-500
E-Mail: ag@anexia.atmailto:ag@anexia.at Web: http://www.anexia.athttp://www.anexia.at/
Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601
Von: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] Im Auftrag von Fletcher Cocquyt Gesendet: Mittwoch, 02. April 2014 18:24 An: toasters@teaparty.net Lists Betreff: Determining what's contributing to fast aggregate growth
Hi all,
In the last 36 hours or so we have a 19Tb aggregate that is growing above 18Tb used. Usually the aggregate used level only grows if we grow its volumes. This is different - I was forced to delete snapshots and shrink volumes to get it back under 90%. And in the last 3 hours its back above 91% - used level is climbing 5-10g/minute
I so far can not see where the growth is coming from, Aggr snapshot is OFF
Ontap 8.1.2
na02> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root, nosnap=on, raidsize=19 64-bit
thanks for any tips!
Hi Fletcher,
SIS running?
BUG* 657692: *Stale metadata not automatically removed during deduplication operations on volume *http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=657692 http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=657692*
https://forums.netapp.com/thread/42487
regards, Tim
2014-04-02 18:24 GMT+02:00 Fletcher Cocquyt fcocquyt@stanford.edu:
Hi all,
In the last 36 hours or so we have a 19Tb aggregate that is growing above 18Tb used. Usually the aggregate used level only grows if we grow its volumes. This is different - I was forced to delete snapshots and shrink volumes to get it back under 90%. And in the last 3 hours its back above 91% - used level is climbing 5-10g/minute
I so far can not see where the growth is coming from, Aggr snapshot is OFF
Ontap 8.1.2
na02> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root, nosnap=on, raidsize=19 64-bit
thanks for any tips!
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Yes sis is active
this may be it
na02> sis status -l
Path: /vol/vm65net State: Enabled Compression: Disabled Inline Compression: Disabled Status: Active Progress: 0 KB (0%) Done Type: Regular Schedule: tue-thu@23 Minimum Blocks Shared: 1 Blocks Skipped Sharing: 0 Last Operation State: Success Last Successful Operation Begin: Thu Mar 27 23:00:00 PDT 2014 Last Successful Operation End: Fri Mar 28 07:01:44 PDT 2014 Last Operation Begin: Thu Mar 27 23:00:00 PDT 2014 Last Operation End: Fri Mar 28 07:01:44 PDT 2014 Last Operation Size: 354 GB Last Operation Error: - Change Log Usage: 2% Logical Data: 8220 GB/69 TB (11%) Queued Job: - Stale Fingerprints: 1%
na02> sis stop /vol/vm65net The operation on "/vol/vm65net" is being stopped. irt-na02> Wed Apr 2 09:35:45 PDT [irt-na02:sis.op.stopped:error]: SIS operation for /vol/vm65net has stopped
Stopped - will see if the stops the growth
thanks!
On Apr 2, 2014, at 9:32 AM, Tim Stiller tim.stiller@gmail.com wrote:
Hi Fletcher,
SIS running?
BUG 657692: Stale metadata not automatically removed during deduplication operations on volume http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=657692
https://forums.netapp.com/thread/42487
regards, Tim
2014-04-02 18:24 GMT+02:00 Fletcher Cocquyt fcocquyt@stanford.edu: Hi all,
In the last 36 hours or so we have a 19Tb aggregate that is growing above 18Tb used. Usually the aggregate used level only grows if we grow its volumes. This is different - I was forced to delete snapshots and shrink volumes to get it back under 90%. And in the last 3 hours its back above 91% - used level is climbing 5-10g/minute
I so far can not see where the growth is coming from, Aggr snapshot is OFF
Ontap 8.1.2
na02> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root, nosnap=on, raidsize=19 64-bit
thanks for any tips!
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
The aggregate growth has stopped since I stopped sis on the vm65net volume. Thanks Tim, and all who replied - this bug was going to fill up our aggregate with 100's of VMs running otherwise.
Now I get to read up more on it and try to reclaim the space Strange this would happen after being stable for so long running de-dup without issue
thanks again, Fletcher
On Apr 2, 2014, at 9:36 AM, Fletcher Cocquyt fcocquyt@stanford.edu wrote:
Yes sis is active
this may be it
na02> sis status -l
Path: /vol/vm65net State: Enabled Compression: Disabled Inline Compression: Disabled Status: Active Progress: 0 KB (0%) Done Type: Regular Schedule: tue-thu@23 Minimum Blocks Shared: 1 Blocks Skipped Sharing: 0 Last Operation State: Success Last Successful Operation Begin: Thu Mar 27 23:00:00 PDT 2014 Last Successful Operation End: Fri Mar 28 07:01:44 PDT 2014 Last Operation Begin: Thu Mar 27 23:00:00 PDT 2014 Last Operation End: Fri Mar 28 07:01:44 PDT 2014 Last Operation Size: 354 GB Last Operation Error: - Change Log Usage: 2% Logical Data: 8220 GB/69 TB (11%) Queued Job: - Stale Fingerprints: 1%
na02> sis stop /vol/vm65net The operation on "/vol/vm65net" is being stopped. irt-na02> Wed Apr 2 09:35:45 PDT [irt-na02:sis.op.stopped:error]: SIS operation for /vol/vm65net has stopped
Stopped - will see if the stops the growth
thanks!
On Apr 2, 2014, at 9:32 AM, Tim Stiller tim.stiller@gmail.com wrote:
Hi Fletcher,
SIS running?
BUG 657692: Stale metadata not automatically removed during deduplication operations on volume http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=657692
https://forums.netapp.com/thread/42487
regards, Tim
2014-04-02 18:24 GMT+02:00 Fletcher Cocquyt fcocquyt@stanford.edu: Hi all,
In the last 36 hours or so we have a 19Tb aggregate that is growing above 18Tb used. Usually the aggregate used level only grows if we grow its volumes. This is different - I was forced to delete snapshots and shrink volumes to get it back under 90%. And in the last 3 hours its back above 91% - used level is climbing 5-10g/minute
I so far can not see where the growth is coming from, Aggr snapshot is OFF
Ontap 8.1.2
na02> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root, nosnap=on, raidsize=19 64-bit
thanks for any tips!
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
I think if you dedup from the beginning it should ditch all the old metadata and rebuild the fingerprint database, hopefully you won't run into the bug again.
sis start -s /vol/vm_volume
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Fletcher Cocquyt Sent: Wednesday, April 02, 2014 2:01 PM To: Tim Stiller Cc: toasters@teaparty.net Lists Subject: Re: Determining what's contributing to fast aggregate growth
The aggregate growth has stopped since I stopped sis on the vm65net volume. Thanks Tim, and all who replied - this bug was going to fill up our aggregate with 100's of VMs running otherwise.
Now I get to read up more on it and try to reclaim the space Strange this would happen after being stable for so long running de-dup without issue
thanks again, Fletcher
On Apr 2, 2014, at 9:36 AM, Fletcher Cocquyt <fcocquyt@stanford.edumailto:fcocquyt@stanford.edu> wrote:
Yes sis is active
this may be it
na02> sis status -l
Path: /vol/vm65net State: Enabled Compression: Disabled Inline Compression: Disabled Status: Active Progress: 0 KB (0%) Done Type: Regular Schedule: tue-thu@23 Minimum Blocks Shared: 1 Blocks Skipped Sharing: 0 Last Operation State: Success Last Successful Operation Begin: Thu Mar 27 23:00:00 PDT 2014 Last Successful Operation End: Fri Mar 28 07:01:44 PDT 2014 Last Operation Begin: Thu Mar 27 23:00:00 PDT 2014 Last Operation End: Fri Mar 28 07:01:44 PDT 2014 Last Operation Size: 354 GB Last Operation Error: - Change Log Usage: 2% Logical Data: 8220 GB/69 TB (11%) Queued Job: - Stale Fingerprints: 1%
na02> sis stop /vol/vm65net The operation on "/vol/vm65net" is being stopped. irt-na02> Wed Apr 2 09:35:45 PDT [irt-na02:sis.op.stopped:error]: SIS operation for /vol/vm65net has stopped
Stopped - will see if the stops the growth
thanks!
On Apr 2, 2014, at 9:32 AM, Tim Stiller <tim.stiller@gmail.commailto:tim.stiller@gmail.com> wrote:
Hi Fletcher,
SIS running?
BUG 657692: Stale metadata not automatically removed during deduplication operations on volume http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=657692
https://forums.netapp.com/thread/42487 regards, Tim
2014-04-02 18:24 GMT+02:00 Fletcher Cocquyt <fcocquyt@stanford.edumailto:fcocquyt@stanford.edu>: Hi all,
In the last 36 hours or so we have a 19Tb aggregate that is growing above 18Tb used. Usually the aggregate used level only grows if we grow its volumes. This is different - I was forced to delete snapshots and shrink volumes to get it back under 90%. And in the last 3 hours its back above 91% - used level is climbing 5-10g/minute
I so far can not see where the growth is coming from, Aggr snapshot is OFF
Ontap 8.1.2
na02> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root, nosnap=on, raidsize=19 64-bit
thanks for any tips!
_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Can confirm
na02> sis start -s /vol/vm65net The file system will be scanned to process existing data in /vol/vm65net. This operation may initialize related existing metafiles. Are you sure you want to proceed (y/n)? y
has brought the aggregate from 90% (18/19Tb) to 82% (16/19Tb) in the last 90 minutes (and space is still being freed)
Pretty ironic a feature (de-dup) designed to save space almost ate all of it! (we have been on 8.1.2 for over a year, and had the same sis config running fine up until this week)
Thanks again
On Apr 2, 2014, at 11:11 AM, Jordan Slingerland Jordan.Slingerland@independenthealth.com wrote:
I think if you dedup from the beginning it should ditch all the old metadata and rebuild the fingerprint database, hopefully you won’t run into the bug again.
sis start –s /vol/vm_volume
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Fletcher Cocquyt Sent: Wednesday, April 02, 2014 2:01 PM To: Tim Stiller Cc: toasters@teaparty.net Lists Subject: Re: Determining what's contributing to fast aggregate growth
The aggregate growth has stopped since I stopped sis on the vm65net volume. Thanks Tim, and all who replied - this bug was going to fill up our aggregate with 100's of VMs running otherwise.
Now I get to read up more on it and try to reclaim the space Strange this would happen after being stable for so long running de-dup without issue
thanks again, Fletcher
On Apr 2, 2014, at 9:36 AM, Fletcher Cocquyt fcocquyt@stanford.edu wrote:
Yes sis is active
this may be it
na02> sis status -l
Path: /vol/vm65net State: Enabled Compression: Disabled Inline Compression: Disabled Status: Active Progress: 0 KB (0%) Done Type: Regular Schedule: tue-thu@23 Minimum Blocks Shared: 1 Blocks Skipped Sharing: 0 Last Operation State: Success Last Successful Operation Begin: Thu Mar 27 23:00:00 PDT 2014 Last Successful Operation End: Fri Mar 28 07:01:44 PDT 2014 Last Operation Begin: Thu Mar 27 23:00:00 PDT 2014 Last Operation End: Fri Mar 28 07:01:44 PDT 2014 Last Operation Size: 354 GB Last Operation Error: - Change Log Usage: 2% Logical Data: 8220 GB/69 TB (11%) Queued Job: - Stale Fingerprints: 1%
na02> sis stop /vol/vm65net The operation on "/vol/vm65net" is being stopped. irt-na02> Wed Apr 2 09:35:45 PDT [irt-na02:sis.op.stopped:error]: SIS operation for /vol/vm65net has stopped
Stopped - will see if the stops the growth
thanks!
On Apr 2, 2014, at 9:32 AM, Tim Stiller tim.stiller@gmail.com wrote:
Hi Fletcher,
SIS running?
BUG 657692: Stale metadata not automatically removed during deduplication operations on volume http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=657692
https://forums.netapp.com/thread/42487
regards, Tim
2014-04-02 18:24 GMT+02:00 Fletcher Cocquyt fcocquyt@stanford.edu: Hi all,
In the last 36 hours or so we have a 19Tb aggregate that is growing above 18Tb used. Usually the aggregate used level only grows if we grow its volumes. This is different - I was forced to delete snapshots and shrink volumes to get it back under 90%. And in the last 3 hours its back above 91% - used level is climbing 5-10g/minute
I so far can not see where the growth is coming from, Aggr snapshot is OFF
Ontap 8.1.2
na02> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root, nosnap=on, raidsize=19 64-bit
thanks for any tips!
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Just wanted to followup - if you are running <=8.1.2 with dedup your aggregates may be growing due to metadata from sis.
run a sis status -l /vol/<volname> check the status
Last Operation State: Failure Last Successful Operation Begin: Sun Jun 16 23:00:00 PDT 2013 Last Successful Operation End: Sun Jun 23 01:35:09 PDT 2013 Last Operation Begin: Thu Apr 3 10:52:07 PDT 2014 Last Operation End: Thu Apr 3 11:13:54 PDT 2014
and run sis start -s /vol/<volname> to reclaim the space
You get the aggrgate space back and as a side effect you might notice the dedup jobs that had been failing are now returning 20% savings again
Better monitoring/alerting about failed dedup jobs would also prevent this
thanks
On Apr 2, 2014, at 3:47 PM, Fletcher Cocquyt fcocquyt@stanford.edu wrote:
Can confirm
na02> sis start -s /vol/vm65net The file system will be scanned to process existing data in /vol/vm65net. This operation may initialize related existing metafiles. Are you sure you want to proceed (y/n)? y
has brought the aggregate from 90% (18/19Tb) to 82% (16/19Tb) in the last 90 minutes (and space is still being freed)
Pretty ironic a feature (de-dup) designed to save space almost ate all of it! (we have been on 8.1.2 for over a year, and had the same sis config running fine up until this week)
Thanks again
On Apr 2, 2014, at 11:11 AM, Jordan Slingerland Jordan.Slingerland@independenthealth.com wrote:
I think if you dedup from the beginning it should ditch all the old metadata and rebuild the fingerprint database, hopefully you won’t run into the bug again.
sis start –s /vol/vm_volume
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Fletcher Cocquyt Sent: Wednesday, April 02, 2014 2:01 PM To: Tim Stiller Cc: toasters@teaparty.net Lists Subject: Re: Determining what's contributing to fast aggregate growth
The aggregate growth has stopped since I stopped sis on the vm65net volume. Thanks Tim, and all who replied - this bug was going to fill up our aggregate with 100's of VMs running otherwise.
Now I get to read up more on it and try to reclaim the space Strange this would happen after being stable for so long running de-dup without issue
thanks again, Fletcher
On Apr 2, 2014, at 9:36 AM, Fletcher Cocquyt fcocquyt@stanford.edu wrote:
Yes sis is active
this may be it
na02> sis status -l
Path: /vol/vm65net State: Enabled Compression: Disabled Inline Compression: Disabled Status: Active Progress: 0 KB (0%) Done Type: Regular Schedule: tue-thu@23 Minimum Blocks Shared: 1 Blocks Skipped Sharing: 0 Last Operation State: Success Last Successful Operation Begin: Thu Mar 27 23:00:00 PDT 2014 Last Successful Operation End: Fri Mar 28 07:01:44 PDT 2014 Last Operation Begin: Thu Mar 27 23:00:00 PDT 2014 Last Operation End: Fri Mar 28 07:01:44 PDT 2014 Last Operation Size: 354 GB Last Operation Error: - Change Log Usage: 2% Logical Data: 8220 GB/69 TB (11%) Queued Job: - Stale Fingerprints: 1%
na02> sis stop /vol/vm65net The operation on "/vol/vm65net" is being stopped. irt-na02> Wed Apr 2 09:35:45 PDT [irt-na02:sis.op.stopped:error]: SIS operation for /vol/vm65net has stopped
Stopped - will see if the stops the growth
thanks!
On Apr 2, 2014, at 9:32 AM, Tim Stiller tim.stiller@gmail.com wrote:
Hi Fletcher,
SIS running?
BUG 657692: Stale metadata not automatically removed during deduplication operations on volume http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=657692
https://forums.netapp.com/thread/42487
regards, Tim
2014-04-02 18:24 GMT+02:00 Fletcher Cocquyt fcocquyt@stanford.edu: Hi all,
In the last 36 hours or so we have a 19Tb aggregate that is growing above 18Tb used. Usually the aggregate used level only grows if we grow its volumes. This is different - I was forced to delete snapshots and shrink volumes to get it back under 90%. And in the last 3 hours its back above 91% - used level is climbing 5-10g/minute
I so far can not see where the growth is coming from, Aggr snapshot is OFF
Ontap 8.1.2
na02> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root, nosnap=on, raidsize=19 64-bit
thanks for any tips!
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
First, let's get a little information
Could you give us the output of the following commands.
(I am assuming your problem aggregate is aggr0)
aggr status -v aggr0 df -hA aggr0 snap list -A aggr0 aggr show_space aggr0
Next, can we narrow down the growth to a single volume or group of volumes?
How about providing a df -h for each volume and also a snap list <volume> for any volumes using excessive snapshot space.
--JMS
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Fletcher Cocquyt Sent: Wednesday, April 02, 2014 12:24 PM To: toasters@teaparty.net Lists Subject: Determining what's contributing to fast aggregate growth
Hi all,
In the last 36 hours or so we have a 19Tb aggregate that is growing above 18Tb used. Usually the aggregate used level only grows if we grow its volumes. This is different - I was forced to delete snapshots and shrink volumes to get it back under 90%. And in the last 3 hours its back above 91% - used level is climbing 5-10g/minute
I so far can not see where the growth is coming from, Aggr snapshot is OFF
Ontap 8.1.2
na02> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root, nosnap=on, raidsize=19 64-bit
thanks for any tips!
It looks like Tim and Alexander also gave you a few good suggestions which are both good places to look.
What protocol does the filer serve as we may be able to get an idea where the writes are coming from with per client stats.
For NFS
Set options nfs.per_client_stats.enable on
First zero counters ssh ntap1 vfiler run vfiler0 nfsstat -z
then list the per client stats and repeat a few minutes later to see what client is sending all the nfs writes. ssh ntap1 vfiler run vfiler0 nfsstat -l
for CIFS
set options cifs.per_client_stats.enable on
cifs top -n 20
--JMS
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Fletcher Cocquyt Sent: Wednesday, April 02, 2014 12:24 PM To: toasters@teaparty.net Lists Subject: Determining what's contributing to fast aggregate growth
Hi all,
In the last 36 hours or so we have a 19Tb aggregate that is growing above 18Tb used. Usually the aggregate used level only grows if we grow its volumes. This is different - I was forced to delete snapshots and shrink volumes to get it back under 90%. And in the last 3 hours its back above 91% - used level is climbing 5-10g/minute
I so far can not see where the growth is coming from, Aggr snapshot is OFF
Ontap 8.1.2
na02> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root, nosnap=on, raidsize=19 64-bit
thanks for any tips!