Brian Tao taob@risc.org writes:
I can only realistically expect a peak of 8 to 10 MB/sec from our filers (for some of them, there is only "busy" hours and "really busy"
Is that from experience or theory?
out to tape. With compression turned on, I figure I'll need about 20MB/sec per tape drive to keep them chugging along. Less
Does M2 have the streaming issues of DLT? You'll certainly never push that much data into a DLT7000 or AIT. ~10MB/sec with bursts to 12 is the best you'll get. I think M2 is probably too new to know how it really behaves in the real world.
constraints you may have. This also gives you a nearline copy of all your data. Combined with the Netapp's snapshots, I should never ever
If you're doing that, it would seem to me it would make more sense to use a second (perhaps in a cluster) toaster and mirror the volumes using snapmirror.
I haven't had an opportunity to really test out how fast rsync
works over NFS with the particular hardware setup described above, so that's the weak link. If the results from trial runs on a non-
It's not all that fast (somewhere i have some numbers I ran). If you're syncing lots of smaller files, you'll burn through memory like mad, too.
there will be a problem. Multiple rsyncs can be fired up concurrently to keep the filers busy. For the amount of data we have (300GB at
I thought they were already quite busy. If they're so busy that you don't feel you can attach a local tape drive, I'm not really understanding how they're unbusy enough to hammer them with a bunch of rsyncs.
Frankly, I think you've made the solution much more complicated than need/should be. Backups should be done as simply as possible -- you've added lots of opportunity for things to break, which really don't need to be there.
Anyone else doing it like this?
I suspect not.
Darrell
On 15 Feb 2000, Darrell Fuhriman wrote:
I can only realistically expect a peak of 8 to 10 MB/sec from our filers (for some of them, there is only "busy" hours and "really busy"
Is that from experience or theory?
This is from experience, between a production F740 to an idle Ultra2 with fast Ethernet connecting the two. An idle F740 to an idle Ultra2 gets me about 15MB/s of aggregate throughput if I have two 100 Mbps connections. I can't seem to make the filer go any faster than that (one shelf of 18GB drives). It's the same whether I'm running a dump over rsh to /dev/null, or copying large files over NFS.
Does M2 have the streaming issues of DLT? You'll certainly never push that much data into a DLT7000 or AIT. ~10MB/sec with bursts to 12 is the best you'll get. I think M2 is probably too new to know how it really behaves in the real world.
No idea yet, but I asked about that in a new thread. You are right about the M2 track record (in that it doesn't really have one yet). Our current Mammoth drives perform to spec though, so I'm crossing my fingers that Exabyte is equally accurate with their Mammoth2 specs.
If you're doing that, it would seem to me it would make more sense to use a second (perhaps in a cluster) toaster and mirror the volumes using snapmirror.
I considered that, but the additional cost of a filer big enough to handle our projected storage requirements plus SnapMirror licenses for all the Netapps was prohibitive. Attaching a library directly to a filer also limits the choice of tape hardware I can use, and I'm pretty much stuck with DOT's tape dump format.
It's not all that fast (somewhere i have some numbers I ran). If you're syncing lots of smaller files, you'll burn through memory like mad, too.
The tests I did run were against filers with millions of little files (mail store with MailDir-style mailboxes) and a rather deep directory structure. I've found that the Sun Ultra 2 running rsync ran out of juice long before the filer did. The worst case was rsyncing a fresh filesystem, which would have been better accomplished with a straight dump|restore anyway.
I thought they were already quite busy. If they're so busy that you don't feel you can attach a local tape drive, I'm not really understanding how they're unbusy enough to hammer them with a bunch of rsyncs.
I don't want to have a local tape drive on each Netapp regardless. I prefer to have a smaller number of bigger libraries that have a certain flexibility in the number of filers they can accomodate. I did not find that rsync "hammered" the filer more than doing a dump over the network. The overall throughput of dump is between 0% and ~20% faster than rsync, once you're actually into the phase where data is being copied. However, the throughput I'm seeing in either case is not enough to keep an M2 streaming (let alone eight of them). Given that I'd rather not interleave multiple backup streams to one drive, my alternative is to spool to disk first, and then back that up in contiguous chunks to tape.
Frankly, I think you've made the solution much more complicated than need/should be. Backups should be done as simply as possible -- you've added lots of opportunity for things to break, which really don't need to be there.
I don't see it that way... I've simply inserted a large buffer between the Netapps and the tape drives. As long as the filesystem replication stage doesn't collide with the tape backup stage, I should be in the clear. The ability to quickly recover files and filesystems from fast, random-access media rather than slogging through sequential-access tapes really appeals to me too.
Anyone else doing it like this?
I suspect not.
Well, I suppose someone has to go first... ;-)
I thought they were already quite busy. If they're so busy that you don't feel you can attach a local tape drive, I'm not really understanding how they're unbusy enough to hammer them with a bunch of rsyncs.
Another point: Is buying and attaching all that disk to the SUN systems really cheaper than simply attaching a local tape drive? And setting up a cluster with SnapMirror would probably be more expensive, but not prohibitively so.
Bruce
On Wed, 16 Feb 2000, Bruce Sterling Woodcock wrote:
Another point: Is buying and attaching all that disk to the SUN systems really cheaper than simply attaching a local tape drive?
You mean a local tape drive to each Netapp? A terabyte of local disk (24x50GB Barracudas, Kingston Datasilo enclosures, hot-swap canisters, cabling) only costs about $40000. 12 tape libraries (one per Netapp) would cost much more than that, and I'd rather deal with a couple of Exabyte X80's than, say, a dozen EXB-220's.
And setting up a cluster with SnapMirror would probably be more expensive, but not prohibitively so.
A loaded E420R tape server with local disk storage comes out to around $75000. An F760 with a terabyte of disk and enough SnapMirror licenses for all the filers adds up to around $450K. Maybe I'm missing something... I'm assuming I need to buy a SnapMirror license for each of the dozen filers mirroring to the tape server filer.
----- Original Message ----- From: Brian Tao taob@risc.org To: Bruce Sterling Woodcock sirbruce@ix.netcom.com Cc: toasters@mathworks.com Sent: Wednesday, February 16, 2000 3:43 PM Subject: Re: Centralized backup of multiple filers
On Wed, 16 Feb 2000, Bruce Sterling Woodcock wrote:
Another point: Is buying and attaching all that disk to the SUN systems really cheaper than simply attaching a local tape drive?
You mean a local tape drive to each Netapp? A terabyte of local
disk (24x50GB Barracudas, Kingston Datasilo enclosures, hot-swap canisters, cabling) only costs about $40000. 12 tape libraries (one per Netapp) would cost much more than that, and I'd rather deal with a couple of Exabyte X80's than, say, a dozen EXB-220's.
Okay, first I was assuming it was less filers than that. After all, 1 terabyte can fit on just one or two filers. Secondly, I was assuming more storage than that; depending on your backup schedule, you need enough storage to cover more than one filer at once. You also need to include the price of your SUN hardware (even if you already have it, you're tying up that money in this scheme rather than elsewhere) and the associated network infrastructure. Finally, if you really have some filers with that small amount of storage, they don't all need a library, just a simple stacker.
And setting up a cluster with SnapMirror would probably be more expensive, but not prohibitively so.
A loaded E420R tape server with local disk storage comes out to
around $75000. An F760 with a terabyte of disk and enough SnapMirror licenses for all the filers adds up to around $450K. Maybe I'm missing something... I'm assuming I need to buy a SnapMirror license for each of the dozen filers mirroring to the tape server filer.
Good question on the last part; I don't know.
Bruce