Re: Snapshot reserve management

10 Dec 2003

      Well, the only times your snapshot usage will grow is when a block
is overwritten or deleted.  Whenever I see a big snapshot usage
increase I can usually pinpoint the reason (DBAs either overwrote
or deleted a database). Simply adding a cron job like:
0 * * * * rsh filer_ip df >> df-history
.. can give you some good historical data you can use for trending.
You might be able to learn something from comparing find runs between
different snapshots and/or your active filesystem.
Attached is a python script I wrote to give me an overall view into
qtree/volume usage, etc.  It's a little specific to my environment,
but you might be able to get some use out of it.  I log the output
of this to a file every hour and later pass over it w/ gnuplot to
display pretty qtree trending info.  If I felt like doing it the
right way, I'd shove all this data into a mysql database...
Sample output (anonymized a bit):
generating report at Wed Dec 10 13:33:06 2003
---] qtree usage report [------------------------------------------------------
filer  vol   qtree        Disk Usage(gb)       Inodes
-------------------------------------------------------------------------------
nas1   vol0  abc          25/35 (73.89%)       37720/-
nas1   vol0  aardvarks    0/3 (24.62%)         65191/-
nas1   vol0  night        1/3 (57.67%)         1426/-
nas1   vol0  qwe1         55/100 (55.83%)      114/-
nas1   vol0  antelope     12/100 (12.53%)      828/-
nas1   vol0  cheeta       0/10 (0.38%)         2441/-
nas1   vol0  nonuseful    0/3 (3.53%)          4814/-
nas1   vol0  qua-log      22/200 (11.25%)      91/-
nas2   vol0  test         0/20 (0.74%)         5012/-
nas2   dw2   vb1          312/400 (78.10%)     198/-
nas2   dw2   sdg-log      24/40 (60.15%)       425/-
nas2   vol1  sdfs         330/400 (82.73%)     281/-
nas2   vol1  fghfd        1/10 (16.09%)        30/-
nas2   vol1  linux        10/100 (10.79%)      31/-
nas2   vol1  windows      106/300 (35.45%)     5108823/-
nas2   vol1  test         0/20 (0.74%)         5012/-
nas2   vol1  testing      5/50 (10.66%)        47/-
nas2   vol1  blahblah     0/5 (8.01%)          41/-
---] volume usage - usable (gb) [----------------------------------------------
filer  volume               usage
nas1   /vol/vol0/           617/2036 (%30.34)
nas2   /vol/vol1/           464/1147 (%40.52)
nas2   /vol/vol0/           852/1147 (%74.33)
nas2   /vol/dw2/            336/1147 (%29.37)
---] volume usage - raw (gb) [-------------------------------------------------
filer  volume               usage
nas1   /vol/vol0/           715/2868 (%24.95)
nas2   /vol/vol1/           523/1434 (%36.50)
nas2   /vol/vol0/           956/1434 (%66.72)
nas2   /vol/dw2/            563/1434 (%39.31)
---] volume allocation totals (gb) [-------------------------------------------
nas1:vol0            1304/2036 (%64.03)
nas2:dw2             440/1147 (%38.35)
nas2:vol0            1180/1147 (%102.85)
nas2:vol1            1000/1147 (%87.16)
---] filer usage (all volumes) - usable (gb) [---------------------------------
filer  usage
nas1   617/2036 (%30.34)
nas2   1654/3441 (%48.07)
---] filer usage (all volumes) - raw (gb) [------------------------------------
filer  usage
nas1   715/2868 (%24.95)
nas2   2044/4302 (%47.51)
---] warnings [-----------------------------------------------------------------WARNING: nas2:vol0 is over-allocated by around 147 GB!
You'll need to modify the
FILERS = { 'nas1': '10.1.30.218',
   'nas2': '10.1.30.219' }
line at the top of the file to reflect your filer(s). The machine
it runs on needs rsh access to the filer. You'll also need to make
sure you have nosnapdir OFF on all volumes.
On Tue, 9 Dec 2003, Brian Tao wrote:
...
I think most any Netapp admin has been in this situation:  you set

aside a chunk of disk space for your snapshot reserve.  After a week
goes by, you see that the reserve is at 150% of allocation.  You
manually delete some snapshots until it falls back under 100%, and
adjust the snap schedule.  A few months go by, new applications are
rolled in and old ones retire.  Snapshot usage has also increased, but
you are at a loss to pinpoint the exact cause of the higher data
turnover rate.
What do people do to shed more light on this kind of situation?

I'd love to be able to conclude "It is the files in /vol/vol0/myapp/data
that are chewing up the most snapshot space" or "It is the write
activity coming from NFS client myhost1 that is causing the most block
turnover".  I think I asked this question about five years ago and did
not discover an adequate solution back then.  I'm hoping someone might
be able to share their expertise on this problem now.  ;-)
--
Brian Tao (BT300, taob@risc.org)
"Though this be madness, yet there is method in't"
--
Antonio Varni

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Snapshot reserve management