Never underestimate the power of rsync. Here is what I did to move about 70mil files (lots of tiny files).
With lots of tiny files, NDMCOPY spends a lot of time before it really transmits anything. In some cases of 10+ mil files on a busy filer I was waiting 4+ hours for files to begin copying (actual data, not just writing out sparse files for inodes). I found that if I ran a level 0 ndmpcopy on the weekend or during your off hours you can let it beat up the filer a little bit (and it will beat up both CPU and Disk util).
Then, during the week run rsync's with --delete to keep things nicely in sync. This is something ndmpcopy can't do (I think).... if you do a level 0 + level 1 you won't get deletes. Doing the "catchup rsync's" gives you a good judge of timing on how long it will take you when you schedule your real outage. I recommend using rsync 2.6.6 or higher as it has significant memory efficiency compared to older versions. I used a 4 CPU Sun V440 to move a lot of data with rsync. Running a single process with a 16 million file list consumes as much as 8GB ram. I almost never found myself CPU bound and I found that CPU on the filer was much lighter than if I was doing an ndmpcopy.
All of this depends on your rate of change for files, but overall I think rsync was faster for me after that initial copy. Since rsync wasn't using all my CPU, I broke it into an rsync for each top level directory, as many as 70 rsync processes in some of my migrations at the same time. This allowed me to sync up 16 million files in < 20 minutes in some cases. I suspect the level 1 ndmpcopy would be slower than that, but I'm not 100% sure.
At the time of cutover, change the export to be read-only to block people from making change. This doesn't work if you have a subdirectory that is hitting a different export line. In this case I would do a second rsync and see what files were still changing and hunt down apps writing to that volume still. I'd be curious to know which method you end up going with and how it wo Original Message ---- From: Darren Dunham ddunham@taos.com To: toasters@mathworks.com Sent: Friday, November 10, 2006 12:37:19 PM Subject: Re: ndmpcopy, jndmpcopy and multiple level 1 transfers
I am getting ready to do a big data migration. I am planning on doing an initial level 0, and then multiple level 1's leading up to the big day.
It doesn't make sense to me why you would do multiple level 1 copies because a level 1 does not copy files that have changed since the previous level 1. To do this you must use a level 2 copy.
I agree with the spirt, but I would rephrase it. :-)
A level 1 *does* copy all the files that have changed since the previous level 1. It's just that they are a subset of files modified since the level 0 which are copied as well.
So running a level 1 on the last day should take about the same amount of time whether any other level 1 copies had been run previously or not.
Agreed on the rest about 0/1/2 ndmpcopy, but for flexibility I do like rsync as an alternative. It's just how much power you can toss at it and how long you can wait for the file list when you've got millions of files.
Remember you don't have to move everything in one rsync command. Espcially if you have multiple clients, it may may sense to split it up into a few separate subtrees.