na_snapmon: snapshot monitor - toasters

1 Jul 1998


      I finally got fed up with deleting snapshots by hand when they started
using too much disk space, so I created a simple snapshot monitor that
automatically deletes snapshots when they start using more than 100%
of the space reserved for snapshot usage.
Now, I configure each filer to schedule more snapshots than I really
want and I use only "snap reserve" to limit how much disk usage is
taken by the snapshots.  If a filer's data turnover rises, it
automatically scales the number of snapshots back.
For example:
Jun 26 10:30:36 sodium na_snapmon: rsh uranium snap delete hourly.3
  Jun 26 10:33:03 sodium na_snapmon: rsh neptunium snap delete nightly.1
  Jun 26 10:34:49 sodium na_snapmon: rsh thorium snap delete hourly.2
  Jun 26 10:40:04 sodium na_snapmon: rsh neptunium snap delete hourly.3
It will also try to delete dump snapshots that are older than two days
old, and it will also ignore snapshots that have been specially
created or renamed.
The best way to run this is as root cron job, every ten minutes or so.
This runs on Linux now, but should also work on Solaris.  Other Unix
systems with very minimal modification.
------- start of cut text --------------
#!/usr/bin/perl
# na_snapmon - monitor snapshot usage and keep it under control
# Copyright (C) 1998  Daniel Quinlan
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
use Getopt::Std;
require "find.pl";
sub usage;
sub trim_snapshots;
sub snap_delete;
my $limit;
my $prog;
$prog = $0;
$prog =~ s@.*/@@;
getopts("hd");
if ($opt_h) {
    &usage;
    exit 0;
}
foreach $dir (@ARGV) {
    my $df_worked = 0;
    if (! -d $dir) {
    warn("$prog: $dir: no such snapshot mount point");
    }
    open (DF, "df -k $dir |");
    while (<DF>) {
    if (/[0-9]%/) {
        $df_worked = 1;
        ($fs, $blocks, $used, $avail, $capacity, $mount) = split;
        if ($capacity !~ /^[0-9]?[0-9]%/) {
    	$fs =~ s/:.*//;
    	trim_snapshots($fs, $dir);
        }
    }
    }
    close (DF);
}
sub trim_snapshots {
    my ($filer, $dir) = @_;
    my (@list, @snapshots, %atime);
# read snapshot list
    opendir(DIR, $dir) || die "can't opendir $dir: $!";
    @snapshots = grep { ! /^..?$/ && -d "$dir/$_" } readdir(DIR);
    closedir DIR;
# sort by atime
    foreach $entry (@snapshots) {
    $atime{$entry} = -A "$dir/$entry";
    }
    @snapshots = sort {$atime{$a} <=> $atime{$b}} @snapshots;
# maybe delete something
    while ($s = (pop @snapshots)) {
    # try to delete old dump snapshots, but since this might
    # fail, keep trying to delete after this one.
    if ($s =~ /^snapshot_for_dump.[0-9]+$/ && time - $atime{$s} >172800) {
        snap_delete($filer, $s);
        next;
    }
    # skip manual snapshots
    next if ($s !~ /^(hourly|nightly|weekly).[0-9]+$/);
    # leave one snapshot at all times
    next if ($s =~ /^hourly.0$/);
    snap_delete($filer, $s);
    last;
    }
}
sub snap_delete {
    my ($filer, $snapshot) = @_;
if ($opt_d || ($> != 0)) {
    print "debug: rsh $filer snap delete $snapshot\n";
    }
    else {
    system("logger -t na_snapmon "rsh $filer snap delete $snapshot"");
    system("rsh $filer snap delete $snapshot");
    }
}
sub usage {
    print <<EOF;
usage: $prog [-hd] [list of snapshot mount points]
-h        print this help
 -d        debugging mode: don't do, just show
EOF
}
------- end ----------------------------