. . . mailing list, it appears that NDMPcopy 'infinite incremental' feature is the solution which provides the shortest possible downtime. What I would like to ask more experienced NetApp admins is whether there are known caveats in issuing something like:
jndmpcopy filer:/vol/vol1 filer:/vol/vol2 -sa root:<passwd> -da root:<passwd> -level i . . .
We had a glitch using this procedure in an upgrade last weekend (copying from an old filer running 5.3.6R2 to a new one running 6.1R1).
We did the level-0 Thursday night, with a level-1 during a Saturday a.m. downtime. The level-0's all claimed to complete without error, but a few of the incrementals based on those level-0's failed to restore, complaining that the full-restore had not completed & thus an incremental restore could not proceed.
We worked around the problem by doing a manual level-1 dump to some alternate disk storage, and then a manual restore from the level-1 images done manually. Note that this glitch didn't affect all of the qtrees we dumped, but we didn't have time to go back and do full level-0's of the ones that failed.
The ramifications of the workaround were that the manual restore of the incremental wasn't able to record the state of any deletions (or renames) which had occurred since the corresponding level-0's, so any files that got deleted on the old filer during Friday, remained on the new filer. As compensation, I ran "rsync -n -a --delete" to compare the old filer against the new one after we were all done, and produced a list of files which appeared on the new filer but not on the old (note the "-n", which has rsync tell what it would do, but not actually do anything).
I've yet to do some post-processing on the list of those files to see which ones were actually deleted during the time period of the level-1. And after the dust settles here, I'll post a bug-report to NetApp about the jndmpcopy/restore problem. On inspection of the logs of the original level-0 runs, I see that the ones that turned up problems at level-1 time all had the "DUMP IS DONE" message, but not a corresponding "RESTORE IS DONE".
Now I had noticed this idiosyncracy ahead of time, but I did a couple checks using rsync and verified that the restores had actually completed just fine. The problem filesystems even had what looked like reasonable "restore_symboltable" files left for the level-1 restores to use. I thought the jndmpcopy process had just shutdown before getting all the log messages from the restoring (destination) filer. My guess now is that what actually happened was that for some reason the restore process didn't get some magic checkpoint value written into the restore_symboltable file, so the incremental restore didn't trust it.
Anyway, I'd say go ahead but make sure you see both "DUMP IS DONE" _and_ "RESTORE IS DONE" on all of your copies, or do them over before proceeding to the next incremental.
And the final moral is that any time you move a bunch of data, things can go wrong, so have a fall-back plan in case all doesn't go perfectly. A tool like rsync on a fast client or two can help a lot.
Regards,