Well, the only times your snapshot usage will grow is when a block is overwritten or deleted. Whenever I see a big snapshot usage increase I can usually pinpoint the reason (DBAs either overwrote or deleted a database). Simply adding a cron job like: 0 * * * * rsh filer_ip df >> df-history
.. can give you some good historical data you can use for trending.
You might be able to learn something from comparing find runs between different snapshots and/or your active filesystem.
Attached is a python script I wrote to give me an overall view into qtree/volume usage, etc. It's a little specific to my environment, but you might be able to get some use out of it. I log the output of this to a file every hour and later pass over it w/ gnuplot to display pretty qtree trending info. If I felt like doing it the right way, I'd shove all this data into a mysql database...
Sample output (anonymized a bit):
generating report at Wed Dec 10 13:33:06 2003
---] qtree usage report [------------------------------------------------------ filer vol qtree Disk Usage(gb) Inodes ------------------------------------------------------------------------------- nas1 vol0 abc 25/35 (73.89%) 37720/- nas1 vol0 aardvarks 0/3 (24.62%) 65191/- nas1 vol0 night 1/3 (57.67%) 1426/- nas1 vol0 qwe1 55/100 (55.83%) 114/- nas1 vol0 antelope 12/100 (12.53%) 828/- nas1 vol0 cheeta 0/10 (0.38%) 2441/- nas1 vol0 nonuseful 0/3 (3.53%) 4814/- nas1 vol0 qua-log 22/200 (11.25%) 91/- nas2 vol0 test 0/20 (0.74%) 5012/- nas2 dw2 vb1 312/400 (78.10%) 198/- nas2 dw2 sdg-log 24/40 (60.15%) 425/- nas2 vol1 sdfs 330/400 (82.73%) 281/- nas2 vol1 fghfd 1/10 (16.09%) 30/- nas2 vol1 linux 10/100 (10.79%) 31/- nas2 vol1 windows 106/300 (35.45%) 5108823/- nas2 vol1 test 0/20 (0.74%) 5012/- nas2 vol1 testing 5/50 (10.66%) 47/- nas2 vol1 blahblah 0/5 (8.01%) 41/-
---] volume usage - usable (gb) [---------------------------------------------- filer volume usage nas1 /vol/vol0/ 617/2036 (%30.34) nas2 /vol/vol1/ 464/1147 (%40.52) nas2 /vol/vol0/ 852/1147 (%74.33) nas2 /vol/dw2/ 336/1147 (%29.37)
---] volume usage - raw (gb) [------------------------------------------------- filer volume usage nas1 /vol/vol0/ 715/2868 (%24.95) nas2 /vol/vol1/ 523/1434 (%36.50) nas2 /vol/vol0/ 956/1434 (%66.72) nas2 /vol/dw2/ 563/1434 (%39.31)
---] volume allocation totals (gb) [------------------------------------------- nas1:vol0 1304/2036 (%64.03) nas2:dw2 440/1147 (%38.35) nas2:vol0 1180/1147 (%102.85) nas2:vol1 1000/1147 (%87.16)
---] filer usage (all volumes) - usable (gb) [--------------------------------- filer usage nas1 617/2036 (%30.34) nas2 1654/3441 (%48.07)
---] filer usage (all volumes) - raw (gb) [------------------------------------ filer usage nas1 715/2868 (%24.95) nas2 2044/4302 (%47.51)
---] warnings [-----------------------------------------------------------------WARNING: nas2:vol0 is over-allocated by around 147 GB!
You'll need to modify the
FILERS = { 'nas1': '10.1.30.218', 'nas2': '10.1.30.219' }
line at the top of the file to reflect your filer(s). The machine it runs on needs rsh access to the filer. You'll also need to make sure you have nosnapdir OFF on all volumes.
On Tue, 9 Dec 2003, Brian Tao wrote:
I think most any Netapp admin has been in this situation: you set
aside a chunk of disk space for your snapshot reserve. After a week goes by, you see that the reserve is at 150% of allocation. You manually delete some snapshots until it falls back under 100%, and adjust the snap schedule. A few months go by, new applications are rolled in and old ones retire. Snapshot usage has also increased, but you are at a loss to pinpoint the exact cause of the higher data turnover rate.
What do people do to shed more light on this kind of situation?
I'd love to be able to conclude "It is the files in /vol/vol0/myapp/data that are chewing up the most snapshot space" or "It is the write activity coming from NFS client myhost1 that is causing the most block turnover". I think I asked this question about five years ago and did not discover an adequate solution back then. I'm hoping someone might be able to share their expertise on this problem now. ;-) -- Brian Tao (BT300, taob@risc.org) "Though this be madness, yet there is method in't"
-- Antonio Varni