Re: Splitting large volumes

29 Aug 2000

      ...
The easiest solution, that also demands minimal downtime is
using ndmpcopy/rsync altogether...
This will work only if you have free volumes on the new machine...

Copy a baseline copy of the seperate parts of your current
volume to the target volumes, using ndmpcopy/rsync/tar/whatever
(ndmpcopy will be the fastest probably).
This can be done while users are working, a few hours/days
before the migration downtime is planned.
In something like a few hours before the downtime, use
the utility "rsync" and/or ndmpcopy to copy incrementally
the changes since the baseline.
DOWNTIME - while in downtime, just do a final syncing, again
using ndmpcopy incremental/rsync, and change the relevant settings:
NIS maps, filer's /etc/exports, /etc/quotas...

In theory, this is the way to go and is  what we ended up doing when we
copied a volume to a new volume about a couple weeks ago.  We did a level
0 dump/restore with ndmpcopy while the filer was up and during the
downtime we picked up the changes with rsh filer dump | rsh filer restore.
We didn't use rsync because we didn't want to lose any CIFS file
attributes.
Let me warn you about some problems we ran into.  We've submitted bug
reports where appropriate.
1) We are running 5.3.5R2P2 and there is a known bug in incremental NDMP
where it fails to dump files that should be dumped because NDMP dump only
looks at the mtime and not the mtime and the ctime.  To work around the
bug, which is only in NDMP dump, we used "rsh filer dump | rsh filer
restore".  We ran into some serious problems where the restore would stop
working and drop into an infinite loop.  This would happen often, but not
always.  I'm talking to netapp about this one.
2) We discovered a bug where a level 0 subtree dump skips files dated
before 00:00:00 Jan 1, 1970 GMT, i.e., files with negative timestamps. 
Don't ask me how they got there, they are user files and we had over 500
of them.  This problem does not happen with a full volume dump, just a
subtree.  So I suggest either doing a full volume dump or running a find
to locate all such files and "touch" them to a time after 1970.
3) Very minor bug -- one of our users has the unix uid 65535, which has
the bit pattern 0xffff which is -1 in a 16 bit signed int.   After the
restore, this user's files were all owned by root instead of 65535.
Nearby uids both above and below worked correctly.
4) Our original filesystem was created under DOT 5.0.2 and the filer was
upgraded at least twice since.  We run a mixed NFS and CIFS environment. 
In some of the older versions of DOT were some bugs in the WAFL directory
format.  One bug allowed two different files to be created in the same
directory with the exact same name.  Even though our version of DOT no
longer has this bug, our volume still had three such pairs of files on it.
The full dump/restore had no problems with these files, but the
incremental restore -r could not cope and failed without restoring
anything.  If you can find these file pairs the fix is simple.  Just
rename one of the files.  You have no control over which one of the two
the filer picks, but afterward, you can see both files.  This is a
particularly insidious problem because any incremental dump of a volume
with duplicate filenames CANNOT BE RESTORED with "restore -r".  You will
be forced to use "restore -x" which is less than desirable.  Dump does not
issue any errors, either.  It's only when you restore that you discover
the problem.
You can locate duplicate files like this:
find dir -print | sort > out1
sort -u out1 > out2
diff out1 out2
MORAL:  Before a big volume copy, do some dry runs to be sure you won't
run into problems during your downtime.  Be sure to test both the full
dump/restore and incremental dump/restore.  We discovered many of these
problems during the two weeks leading up to our downtime.  Even still, we
hit a couple snags during the downtime that cost us at least 2 hours.
One final tip:  Before running restore -r on an incremental dump,
be sure to save a copy of the  restore_symboltable file since
restore -r modifies it.  If the restore modifies the file and then
fails, you can't rerun the "restore -r" unless you put back the
original file.  Even then you may have problems, and will need to
use "restore -x".
Steve Losen   scl@virginia.edu    phone: 804-924-0640
University of Virginia               ITC Unix Support

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Splitting large volumes