We have a two-controller FAS2020 NetApp with 1.5TB of storage (1TB on one,
500GB on the other), licensed for iSCSI only, and an old desktop system
connected to a 12-tape library of LTO-4 (800GB uncompressed) tapes. I have
gotten Bacula to back up files stored on ext3fs (Linux) and NTFS (Windows)
partions on iSCSI LUNs on the NetApp.
The Bacula instance is running under Ubuntu Gutsy (a "long-term supported"
release), using the Gutsy package, 2.0.3. We're using Postgres version 8.2.5
as the database, because that's the available Ubuntu package and we use
Postgres elsewhere. We're using the "open-iscsi" initiator, from an Ubuntu
package. LVM and device-mapper are not installed on the backup-host, and it
only has a single Fast Ethernet connection. (The hosts that are being backed
up generally have dual Gigabit Ethernet connections to the filers.)
I wrote several scripts to allow Bacula to take snapshots on the filer and
mount them in a known place before the backup-job runs, then clean them up
afterward. I can't publish them because they have a lot of details specific
to our network, but the general outline is as follows:
"easy_snap" takes 3 arguments: an NTFS or ext3fs volume-label, an indication
of which filer the volume is on, and the complete path to the LUN on the
filer (such as "/vol/vol5/testdatav2.lun").
It makes up a name to use for the snapshot. It issues "snap create", "lun
clone create" and "lun map" commands via RSH to the filer. (It also checks
to see if another snapshot of the same volume is still mounted by mistake;
if so, it unmounts and unmaps it.) If another snapshot with the same name
exists, the script tries to delete it, and renames it if that doesn't work.
It cleans up any old "bacula-" snapshots. Then it issues a
local "iscsiadm -m session --rescan" command, and if it still can't see the
new LUN, logs the iSCSI session out and in again. It mounts the partition
using "LABEL=" syntax, to a mount-point under /mnt where the last part is the
lower-case form of the volume-label.
"easy_unsnap" takes the same 3 arguments as "easy_snap". It unmounts the
partition, logs out the iSCSI session, unmaps the LUN, and tries to delete
the snapshot. We noticed that the snapshots weren't always being deleted,
which is one reason why "easy_snap" tries to clean up old snapshots several
ways. If a backup spans an hour-boundary where hourly snapshots are being
created, or midnight for the nightly snapshot, the mapped LUN becomes part of
the snapshot and the snapshot we took can't be deleted until the hourly or
nightly snapshot goes away.
I tried backing up Linux hosts with a filesystem on a raw (no partions) LUN
using this script, but it didn't work: when the LUN is discovered,
the /dev/disk/by-label/LABEL symlink does not get created. We wouldn't be
able to back up LVM volumes with this configuration either, because LVM
depends on device-mapper, and device-mapper would try to grab the disks
itself.
Right now our performance is limited by the network interface for LUNs
containing a few large files, and by Postgres INSERT performance for
filesystems with lots of small files. We get 9MByte/sec for the big-file
LUNs, so for example a 46-GB SQL Server database takes an hour and 18 minutes
to run. We have a test-system with over 300,000 files (including over
100,000 zero-length ones) in just a few directories, but only 580 MB of total
data; that took two and a half hours to back up last night. However, our
entire backup (except test-systems) ran within 6 hours, and we already have
more than half of our production data being backed up with Bacula.
For the most part, Bacula does what we want it to do in this case, and it
allows a lot more flexibility with restoring files than we were able to
achieve trying to use the "dump" command on the NetApp filers.
--
David L. Lambert
Yahoo! IM: davidleelambert ** MSN IM: lamber45(a)cse.msu.edu