The easiest solution, that also demands minimal downtime is using ndmpcopy/rsync altogether...
This will work only if you have free volumes on the new machine...
- Copy a baseline copy of the seperate parts of your current volume to the target volumes, using ndmpcopy/rsync/tar/whatever (ndmpcopy will be the fastest probably). This can be done while users are working, a few hours/days before the migration downtime is planned.
- In something like a few hours before the downtime, use the utility "rsync" and/or ndmpcopy to copy incrementally the changes since the baseline.
- DOWNTIME - while in downtime, just do a final syncing, again using ndmpcopy incremental/rsync, and change the relevant settings: NIS maps, filer's /etc/exports, /etc/quotas...
In theory, this is the way to go and is what we ended up doing when we copied a volume to a new volume about a couple weeks ago. We did a level 0 dump/restore with ndmpcopy while the filer was up and during the downtime we picked up the changes with rsh filer dump | rsh filer restore. We didn't use rsync because we didn't want to lose any CIFS file attributes.
Let me warn you about some problems we ran into. We've submitted bug reports where appropriate.
1) We are running 5.3.5R2P2 and there is a known bug in incremental NDMP where it fails to dump files that should be dumped because NDMP dump only looks at the mtime and not the mtime and the ctime. To work around the bug, which is only in NDMP dump, we used "rsh filer dump | rsh filer restore". We ran into some serious problems where the restore would stop working and drop into an infinite loop. This would happen often, but not always. I'm talking to netapp about this one.
2) We discovered a bug where a level 0 subtree dump skips files dated before 00:00:00 Jan 1, 1970 GMT, i.e., files with negative timestamps. Don't ask me how they got there, they are user files and we had over 500 of them. This problem does not happen with a full volume dump, just a subtree. So I suggest either doing a full volume dump or running a find to locate all such files and "touch" them to a time after 1970.
3) Very minor bug -- one of our users has the unix uid 65535, which has the bit pattern 0xffff which is -1 in a 16 bit signed int. After the restore, this user's files were all owned by root instead of 65535. Nearby uids both above and below worked correctly.
4) Our original filesystem was created under DOT 5.0.2 and the filer was upgraded at least twice since. We run a mixed NFS and CIFS environment. In some of the older versions of DOT were some bugs in the WAFL directory format. One bug allowed two different files to be created in the same directory with the exact same name. Even though our version of DOT no longer has this bug, our volume still had three such pairs of files on it. The full dump/restore had no problems with these files, but the incremental restore -r could not cope and failed without restoring anything. If you can find these file pairs the fix is simple. Just rename one of the files. You have no control over which one of the two the filer picks, but afterward, you can see both files. This is a particularly insidious problem because any incremental dump of a volume with duplicate filenames CANNOT BE RESTORED with "restore -r". You will be forced to use "restore -x" which is less than desirable. Dump does not issue any errors, either. It's only when you restore that you discover the problem.
You can locate duplicate files like this:
find dir -print | sort > out1 sort -u out1 > out2 diff out1 out2
MORAL: Before a big volume copy, do some dry runs to be sure you won't run into problems during your downtime. Be sure to test both the full dump/restore and incremental dump/restore. We discovered many of these problems during the two weeks leading up to our downtime. Even still, we hit a couple snags during the downtime that cost us at least 2 hours.
One final tip: Before running restore -r on an incremental dump, be sure to save a copy of the restore_symboltable file since restore -r modifies it. If the restore modifies the file and then fails, you can't rerun the "restore -r" unless you put back the original file. Even then you may have problems, and will need to use "restore -x".
Steve Losen scl@virginia.edu phone: 804-924-0640
University of Virginia ITC Unix Support