Success using Bacula to back up an iSCSI-only NetApp - toasters

23 Dec 2008

      We have a two-controller FAS2020 NetApp with 1.5TB of storage (1TB on one, 
500GB on the other), licensed for iSCSI only, and an old desktop system 
connected to a 12-tape library of LTO-4 (800GB uncompressed) tapes.  I have 
gotten Bacula to back up files stored on ext3fs (Linux) and NTFS (Windows) 
partions on iSCSI LUNs on the NetApp.
The Bacula instance is running under Ubuntu Gutsy (a "long-term supported" 
release), using the Gutsy package, 2.0.3.  We're using Postgres version 8.2.5 
as the database,  because that's the available Ubuntu package and we use 
Postgres elsewhere.  We're using the "open-iscsi" initiator, from an Ubuntu 
package.  LVM and device-mapper are not installed on the backup-host,  and it 
only has a single Fast Ethernet connection.  (The hosts that are being backed 
up generally have dual Gigabit Ethernet connections to the filers.)
I wrote several scripts to allow Bacula to take snapshots on the filer and 
mount them in a known place before the backup-job runs,  then clean them up 
afterward.  I can't publish them because they have a lot of details specific 
to our network, but the general outline is as follows:
"easy_snap" takes 3 arguments:  an NTFS or ext3fs volume-label, an indication 
of which filer the volume is on, and the complete path to the LUN on the 
filer (such as "/vol/vol5/testdatav2.lun").
It makes up a name to use for the snapshot. It issues "snap create", "lun 
clone create" and "lun map" commands via RSH to the filer.  (It also checks 
to see if another snapshot of the same volume is still mounted by mistake;  
if so, it unmounts and unmaps it.)  If another snapshot with the same name 
exists,  the script tries to delete it, and renames it if that doesn't work.  
It cleans up any old "bacula-" snapshots.  Then it issues a 
local "iscsiadm -m session --rescan" command,  and if it still can't see the 
new LUN,  logs the iSCSI session out and in again.  It mounts the partition 
using "LABEL=" syntax, to a mount-point under /mnt where the last part is the 
lower-case form of the volume-label.
"easy_unsnap" takes the same 3 arguments as "easy_snap".  It unmounts the 
partition, logs out the iSCSI session, unmaps the LUN, and tries to delete 
the snapshot.  We noticed that the snapshots weren't always being deleted, 
which is one reason why "easy_snap" tries to clean up old snapshots several 
ways.  If a backup spans an hour-boundary where hourly snapshots are being 
created, or midnight for the nightly snapshot, the mapped LUN becomes part of 
the snapshot and the snapshot we took can't be deleted until the hourly or 
nightly snapshot goes away.
I tried backing up Linux hosts with a filesystem on a raw (no partions) LUN 
using this script,  but it didn't work:  when the LUN is discovered,  
the /dev/disk/by-label/LABEL symlink does not get created.  We wouldn't be 
able to back up LVM volumes with this configuration either, because LVM 
depends on device-mapper, and device-mapper would try to grab the disks 
itself.
Right now our performance is limited by the network interface for LUNs 
containing a few large files,  and by Postgres INSERT performance for 
filesystems with lots of small files.  We get 9MByte/sec for the big-file 
LUNs, so for example a 46-GB SQL Server database takes an hour and 18 minutes 
to run.  We have a test-system with over 300,000 files (including over 
100,000 zero-length ones) in just a few directories, but only 580 MB of total 
data;  that took two and a half hours to back up last night.  However,  our 
entire backup (except test-systems) ran within 6 hours,  and we already have 
more than half of our production data being backed up with Bacula.
For the most part,  Bacula does what we want it to do in this case,  and it 
allows a lot more flexibility with restoring files than we were able to 
achieve trying to use the "dump" command on the NetApp filers.
-- 

David L. Lambert
Yahoo! IM:   davidleelambert  **  MSN IM:  lamber45@cse.msu.edu