This is my plan, after having debated the merits of distributed tape libraries on each filer vs. centralized tape library with network backup. I've posted separately to both the toasters and bigbackup mailing lists (even though I figure most people on the second list are also on the first).
- backup clients are 12 filers (mostly F740's), each with multiple 100 Mbps Ethernet interfaces
- backup servers are 2 Sun E420R's with enough CPU, memory, U2SCSI and Gigabit interfaces to keep things humming
- each filer has 1 or 2 100 Mbps interfaces plugged into a switch, with the backup servers on Gigabit (probably something like a Catalyst 3524XL: 24 10/100 + 2 Gigabit)
- each backup server will have four U2SCSI channels or two FC-AL loops, initially with half a terabyte of local disk and an Exabyte X80 library with 4 Mammoth2 drives (expandable to 8)
- stage 1 backup: filesystems on all the filers will be replicated to the tape servers' local drives (probably rsync over NFS)
- stage 2 backup: local filesystems are streamed to tape
This seems to work around most of the "problems" associated with backing up directly to tape, with a few extra side benefits thrown in. I can only realistically expect a peak of 8 to 10 MB/sec from our filers (for some of them, there is only "busy" hours and "really busy" hours). That's not enough to keep the tape drives streaming and happy. To do that, I'd have to multiplex backup streams to a single tape, and I always thought that was a bad idea.
Hard drives, of course, have no "streaming" issues. They'll take the data however fast or slow the Netapps can send them. Once the Netapp filesystems have been replicated to local disk, you blast them out to tape. With compression turned on, I figure I'll need about 20MB/sec per tape drive to keep them chugging along. Less shoeshining, less wear-and-tear on the media, longer tape drive MTBF.
Since all the filer filesystems are consolidated on local storage, you can slice-n-dice your backup sets to fit whatever drive/tape/time constraints you may have. This also gives you a nearline copy of all your data. Combined with the Netapp's snapshots, I should never ever have to go to tape to retrieve a current generation copy of a file that was accidentally deleted or corrupted. Disaster recovery of a downed filesystem can also come off local disk instead of tape.
If you use commercial tape backup software, you don't have to worry about buying and maintaining licenses for all the Netapps: all the software sees is one server backing up its own drives to a tape stacker. This may result in savings greater than the cost of the local drive storage.
I haven't had an opportunity to really test out how fast rsync works over NFS with the particular hardware setup described above, so that's the weak link. If the results from trial runs on a non- dedicated Ultra2 can be scaled up to a quad CPU E420R, I don't think there will be a problem. Multiple rsyncs can be fired up concurrently to keep the filers busy. For the amount of data we have (300GB at present), I expect the tape drives will only be busy for about an hour doing a weekly full backup, and only a few minutes each day for differentials.
Anyone else doing it like this?
----- Original Message ----- From: Brian Tao taob@risc.org To: toasters@mathworks.com Sent: Tuesday, February 15, 2000 10:50 PM Subject: Centralized backup of multiple filers
This is my plan, after having debated the merits of distributed
tape libraries on each filer vs. centralized tape library with network backup.
Have you considered Netapp's recently announced ability to share backup devices between filers through a FC SAN using Legato and a Vixel switch? But I guess this doesn't address your problem, which seems to be streaming speed to the tape to minimize backup times. Personally I think you're spending a lot of money just to make backups faster, and I'm not even sure how much time that saves, since you have to spend time to do the rsync over the net first, and then do the local backup from the UNIX box.
Bruce
Do any of the available backup solutions support a "dump to disk mode" like amanda? Amanda uses the disk on the local backup machine as the cache. After the backup completes to disk amanda dumps that file to tape. This takes care of the streaming issues. This of course would be a problem with large NetApp filesystems. But imagine if the packages (veritas, legato, workstation solutions, etc.) took the incoming data stream and wrote it as configurable chunks (100Mb, 1GB, etc.) to disk. They could then flush those chunks to tape.
barry
On Wed, 16 Feb 2000, Bruce Sterling Woodcock wrote:
----- Original Message ----- From: Brian Tao taob@risc.org To: toasters@mathworks.com Sent: Tuesday, February 15, 2000 10:50 PM Subject: Centralized backup of multiple filers
This is my plan, after having debated the merits of distributed tape libraries on each filer vs. centralized tape library with network backup.
Have you considered Netapp's recently announced ability to share backup devices between filers through a FC SAN using Legato and a Vixel switch? But I guess this doesn't address your problem, which seems to be streaming speed to the tape to minimize backup times. Personally I think you're spending a lot of money just to make backups faster, and I'm not even sure how much time that saves, since you have to spend time to do the rsync over the net first, and then do the local backup from the UNIX box.
Bruce
Do any of the available backup solutions support a "dump to disk mode" like amanda? Amanda uses the disk on the local backup machine as the cache. After the backup completes to disk amanda dumps that file to tape. This takes care of the streaming issues. This of course would be a problem with large NetApp filesystems. But imagine if the packages (veritas, legato, workstation solutions, etc.) took the incoming data stream and wrote it as configurable chunks (100Mb, 1GB, etc.) to disk. They could then flush those chunks to tape.
barry
We're doing something similar to this. We are backing up to an ADSM server with a HSM filesystem. This filesystem is stored on a tape robot with a disk cache. As files are created on disk they are migrated to the tape robot and replaced on the disk with a "stub" file. To the user it just looks like a very large (but slow) filesystem. If you try to access a file that is migrated to tape, your process suspends while the robot stages the file back to disk.
So we are running our dumps from the ADSM server more or less like this:
cd /dumps/netapp/`date %Y.%m.%d` rsh netapp dump 0uf - /vol/vol0 | split -b 1024m - dump.
split breaks the dump into 1G files named dump.aa dump.ab ... and as the disk cache fills, the HSM system migrates these files to the tape robot. We picked 1G because it seems like a safe and manageable file size.
To restore, you just do this:
cat dump.?? | rsh netapp restore -r -f -
Admittedly, it's clunky to restore individual files, but we use snapshots for that. These dumps are primarily for disaster recovery.
We do full dumps every month and level 3 dumps every week and level 5 dumps every day. To recover space on the tape robot, we simply rm old dumps from the filesystem. We have a nightly cron script that runs on the ADSM server that handles everything.
Steve Losen scl@virginia.edu phone: 804-924-0640
University of Virginia ITC Unix Support
ADSM does that and has for years.
Why NetApp and IBM won't get together (I don't know how much they have discussed) to deliver a native ADSM client that can run inside a NetApp filer, I don't know. For all that some people complain about ADSM, it just works. It makes excellent use of a tape pool and doesn't require directly-attached tape.
A native client that runs under ONTAP would avoid the overhead of NFS.
Barry Lustig wrote:
Do any of the available backup solutions support a "dump to disk mode" like amanda? Amanda uses the disk on the local backup machine as the cache. After the backup completes to disk amanda dumps that file to tape. This takes care of the streaming issues. This of course would be a problem with large NetApp filesystems. But imagine if the packages (veritas, legato, workstation solutions, etc.) took the incoming data stream and wrote it as configurable chunks (100Mb, 1GB, etc.) to disk. They could then flush those chunks to tape.
On Wed, 16 Feb 2000, Bruce Sterling Woodcock wrote:
Have you considered Netapp's recently announced ability to share backup devices between filers through a FC SAN using Legato and a Vixel switch?
Yes, but folks on my team have had suboptimal experiences with Legato software and Vixel hardware in the past, so I'm not too excited about it.
Personally I think you're spending a lot of money just to make backups faster, and I'm not even sure how much time that saves, since you have to spend time to do the rsync over the net first, and then do the local backup from the UNIX box.
My thinking is that with slow clients, throughput to a disk device will be faster than to a tape device, because of tape start/stop overhead. Therefore, the backup window (as far as the Netapps are concerned) is shortened. Data from local disk can then be sent to tape outside of the normal backup window. You do end up with two useful copies of your data, so the time spent moving data around isn't wasted.
----- Original Message ----- From: Brian Tao taob@risc.org To: Bruce Sterling Woodcock sirbruce@ix.netcom.com Cc: toasters@mathworks.com Sent: Wednesday, February 16, 2000 5:07 PM Subject: Re: Centralized backup of multiple filers
On Wed, 16 Feb 2000, Bruce Sterling Woodcock wrote:
Have you considered Netapp's recently announced ability to share backup devices between filers through a FC SAN using Legato and a Vixel switch?
Yes, but folks on my team have had suboptimal experiences with
Legato software and Vixel hardware in the past, so I'm not too excited about it.
That's too bad. I'm hoping it catches on. Maybe Netapp needs to support more than just Vixel?
Personally I think you're spending a lot of money just to make backups faster, and I'm not even sure how much time that saves, since you have to spend time to do the rsync over the net first, and then do the local backup from the UNIX box.
My thinking is that with slow clients, throughput to a disk device
will be faster than to a tape device, because of tape start/stop overhead. Therefore, the backup window (as far as the Netapps are concerned) is shortened.
I would agree that individual filers could be available sooner, sure. Personally I don't think the load hit is so great that your backup window has to be very small, but you say you run them pretty hot, so YMMV.
Data from local disk can then be sent to tape outside of the normal backup window. You do end up with two useful copies of your data, so the time spent moving data around isn't wasted.
Right; my point though was once you add this time into the equation, the overall backup time until the data is "safe" on a tape could be not much less than backing up from the filer directly.
Bruce