Re: Migration Strategies, what are you doing?

25 Jan 2001


      funny you should ask.
we migrated 2Tb from a pair of 630's onto a pair of clustered 760's, and
2 720's
initial setup:
630 "k" vol0 - 4 shelves of 9G	-- tools1
        vol1 - 2 shelves of 18G -- proj1, proj2, proj3, proj4
        vol2 - 2 shelves of 18G -- archive, proj5, proj6, RCS1
630 "b" vol0 - 2 shelves of 18G -- dept1 homes, proj7, proj8, /usr/local
        vol1 - 2 shelves of 18G -- dept2 homes, legacytools, libraries, RCS2
final setup:
760 "p" vol0 - 2 shelves of 18G -- dept1 homes
        vol1 - 2 shelves of 18G -- dept2 homes
        vol2 - 2 shelves of 18G -- libraries
760 "s" vol0 - 2 shelves of 36G -- tools2 (new)
        vol1 - 2 shelves of 18G -- tools1
        vol2 - 2 shelves of 18G -- /usr/local, RCS1, RCS2, archive, legacytools
720 "d" vol0 - 2 shelves of 18G -- proj7,proj5
        vol1 - 2 shelves of 18G -- proj2, proj3
720 "i" vol0 - 2 shelves of 18G -- proj8, proj6
        vol1 - 2 shelves of 18G -- proj1, proj4
in order to do this move, we had to shutdown the company.  all unix
machine would have to be rebooted to get rid of the stale NFS mounts
because we moved their /usr/local and their tools.  time is money,
so i was once offered a time frame of 4 hours in which i could do
this move. i like it when people make me happy and full of laughter.
we negotiated to 24hrs hours from friday 7pm to saturday 7pm. At
20G/hour, 2T should take about 100 hours. hmmmm.
i used snapmirror extensively for a lot of this. but i couldn't use it
for all of the moves. for example, the 720 "i" is getting things put to
both vol0 and vol1. snapmirror has to copy vol to vol, so in the above
scheme to use snapmirror on the 720 "i" both vol0 and vol1 would be
offline and my machine would have no volume available for the root vol.
every target filer would require at least one of it's volumes to
be online. that meant that snapmirror was useless for at least one
volume of every filer. As snapmirror requires the target volume to be
at least the same size as the source volume, i could not borrow a disk
from each volume to make a tiny root volume only for later destruction.
sigh.
the tools i used were cpio, snapmirror, ndmpcopy and rsync.
cpio:
As moving a project only meant the interruption of a slice of the
  company, i used cpio to move as many of those as i was permitted
  to interrupt.  i got two.
"b" vol0 proj7 --> "d" vol0
  "b" vol0 proj8 --> "i" vol0
typically i do cpio with two scripts. the first script [1] makes
a checkmark file and then copies everything which can take most of
a day, depending on the size of the project. the second script[2]
grabs what has changed since the start of the first copy and is
usually under an hour.  this lets me get the downtime for a project
to a negotiable amount.
snapmirror:
snapmirror is a vol to vol copy "at the block level". it is a front
  end to the vol copy command.
the library people needed space and space NOW, so i gave them their
  vol2 on "p" and marked it as root. this let me snapmirror to "p"
  vol0 and "p" vol1.
it couldn't "just put the library people on vol0" because vol0 was
  comprised of 36G drives.  because 2 shelves of 18G are smaller than
  4 shelves of 9G, we used 2 shelves of 36G for the tools1 qtree. it
  didn't matter that the usage on tools1 was smaller than what 2
  shelves of 18G will hold, it would not fit no matter how hard i
  pushed. and i pushed hard.
as i had cpio'd proj7 and proj8 to "d" vol0 and "i" vol0
  respectively, i would use ndmpcopy to copy proj5 and proj6 
  and use snapmirror onto "d" vol0 and  "i" vol0.
as "s" vol0, was a new qtree for tools, i set that up and could
  then snapmirror to "s" vol1 and "s" vol2.
"k" vol1 (proj1, proj2, proj3, proj4) was effectively being split
  into "i" vol0 and "d" vol0 so i snapmirrored "k" vol1 onto both.
some volumes were "mostly" going to one target but snapmirror would
  drag along some unwanted qtrees. for example, "b" vol0 contained
  /usr/local and homes but only homes was to end up on "p" vol0. the
  8G of /usr/local came along for the ride. i deleted these tagalong
  qtrees using a parallel removeit script i wrote for this [3]. there
  is no fast way to delete a qtree on a netapp. pity that.
snapmirror will also happily take ALL of your filer - cpu and
  Kb/s throughput. i learned the hard way to throttle snapmirror to
  3000Kb/s *total* from a 630. ie: if i ran two snapmirrors, each
  was throttled to 1500Kb/s.
if two snapshot calculations happened to coincide, the cpu would
  peg at 100% and we got NFS timeouts.  because of that, i would turn
  on snapmirror in the morning for a few iterations, each morning
  leading up to M-day. you can either arrive at work 3 hours before
  the rest of the company or you can do it via cron on the admin host.
the snapmirrors i did ended up as:
  "b" vol0 --> "p" vol0
  "b" vol1 --> "p" vol1
  "k" vol0 --> "s" vol1
  "k" vol2 --> "s" vol2
  "k" vol1 --> "d" vol1
  "k" vol1 --> "i" vol1
ndmp (or what they didn't tell you behind the school):
only run 4 in parallel.
there is a limit of 6 parallel ndmpd copies on a filer. it's
    written into the code that way.  i learnt the hard way that when
    the limit is hit, ndmpd just dies horribly.
i learnt the harder way that there are certain situations that
    will make the limit 4. i only copy 4 in parallel using the below
    scripts [4][5].
secondly, there is a bug (25649) where ndmp copies will peg the CPU
    at 100%, but not actually move data. this was fixed in 5.3.6R1P1,
    but was lost in 5.3.6R2 -- all my target filers ran 5.3.6R1P1
    for the migration.  the F630s remained at 5.3.4 as we were afraid
    to move higher due to a risk of invoking the "18G spinup issue"
    when we upgraded the firmware.
i used level 0 ndmpcopy to move the smaller projects that could
  be moved once i finished the snapmirrors and put the volumes back
  online. for example, /usr/local came along with dept1 homes to "p"
  vol0, but i wanted it on "s" vol2. that target was also a snapmirror
  target, so when i turned off the snapmirror on "s" vol2, i could
  ndmpcopy the 8G of /usr/local to the correct place.
i too have had grief when doing level 1 ndmpcopies. not always,
  but about 50% of the time.
rsync (version 2.4.3  protocol version 24):
"p" vol2 was to go to be a new work area for the library people. as
  they were already using it, i couldn't snapmirror the old libraries
  from "b" vol1 to "p" vol2.
as i have had grief when doing level 1 [2,3,4...] ndmpcopies,
  i tried rsync.
the directory walk in rsync is the killer. unfortunately what i
  was moving was less of a qtree and more of a qbush so the walk
  took about 5 hours on an sun E450 (4cpu, 4Gmem) which does a lot of
  backups at night, but not much during the day.
rsync looked good but in the end failed (more later).
the move (M-day):
6:00pm turn on snapmirror: "s", "p", "i" "d"
7:00pm mark "b" and "k" readonly
7:15pm turn off snapmirror: "s", "p", "i", "d"
remove "d" vol1 proj1 and proj4
         remove "i" vol1 proj2 and proj3
start final rsync of libraries from "b" vol1 to "p" vol2
ndmpdcopy proj5 to "i" vol0
         ndmpdcopy proj6 to "d" vol0
         ndmpdcopy /usr/local to "s" vol2
         ndmpdcopy legacytools to "s" vol2
         ndmpdcopy RCS1 to "s" vol2
10:00am rewrite automount tables
          change backup s/w
          edit exports on new filers
          edit quotas new filers
1:00pm install new automount tables
1:10pm reboot NIS servers
1:20pm reboot computer room servers
1:30pm reboot company
2:00pm
         start removing things that "came along" with snapmirror
2:01pm have beer
what went wrong:
rsync, which had been taking about 5 hours to sync the files, decided
  to go away and not come back during the final run. darn. after 13
  hours, it wasn't finished. the next morning i aborted and did a
  level 0 ndmpcopy.
my parallel removeit [3] script contained a bug. this was bad. very very
  very bad. i deleted "stuff" from "s" vol2 and since it was a set of
  parallel "rm -rf" i had no idea what was gone and what was not gone.
using snaprestore, we got the volume back to the nightly.1 snapshot
  which was taken just as the ndmpcopies had been finishing up.
  i then restarted the three ndmpcopies to that volume. at about 12:30
  everything looked good. we were 1/2 hour ahead of schedule. in fact
  i had already started deleting things on "s" vol2. it was at that
  point i noticed problems with the quotas on that volume.
when i started quotas, the 760 complained about duplicate entries.
  there were none. really. then i looked at the output of quota report.
# rsh s quota report
  Type       ID    Volume    Tree    ...   Quota Specifier
  ----- --------  -------- --------  ...   -----------------
  tree         1     vol2 local      ...   /vol/vol2/local
  tree         2     vol2 legacy     ...   /vol/vol2/legacy
  tree         2     vol2 RCS1       ...   /vol/vol2/legacy
somehow RCS1 had been associated with the same Quota Specifier as
  another qtree. all told, there were 5 qtrees all jumbled together like
  that.
furthermore, the quotas on the "real" qtrees for that location were
  a sum of a number of the 5 jumbled qtrees. however a "du" reported the
  correct amount. it was not good a situation.
we never figured out if this was a side effect of snaprestore,
  the parallel removes, ndmpdcopy, having the snapshot to which i
  restored contain a "halfdone" ndmpd transfer, the 900% [6] usage
  of /vol/vol2/.snapshot, the phase of the moon or something else.
the only solution was to completely remove the corrupt qtrees and
  recopy them. renaming the qtree would not fix it. it would still
  have that same bad association.
tree         2     vol2 RCS1.bent  ...   /vol/vol2/legacy
i had to wait for the remove to finish before i could start the
  copy again.
in order to perhaps speed up things. i copied (for example) RCS1 from
  the 760 "p" to the 760 "p" hoping that after the delete was done the
  copy back between 760s would be faster than from the 630 to the 760.
i have never seen so many errors in my life from ndmp. crosslinked
  inodes is the best i can do to describe it. let's just leave it at
  that. we gave up on the shortcut and just copied from the 630.
my goodness, my guinness:
we got done within the 24 hours. planning and a selection of tools.
  was the only way to get this done.
coincidentally, the same weekend i moved 4 shelves of 9G to 4 shelves
  of 18G. i did level 0, 1, 2 ndmp copies on the Monday, Wednesday,
  Thursday, planning to do a level 3 on Friday evening as the final move.
the level 2 ndmpcopy (Wednesday) failed miserably on two of the
  volumes and I had to restart from level 0. do not rely on your
  ndmpcopy working all the time.
scripts and notes:
all and any script supplied is to be used at the risk of the user. these
are provided "as is" expressly with no warrantee and no guarantee. in
particular the removeit script (number 3) has been used by myself to
cause damage. it has been fixed since then, but still, be careful.
[1]
---cut--here---8<--------8<---cut--here---8<--------8<---cut--here---8<-------
#!/bin/sh
# initial cpio copy sample script
#
touch startfile
find proj -depth -xdev | fgrep -v .snapshot | cpio -pdm newplace
#
# EOF
#
---cut--here---8<--------8<---cut--here---8<--------8<---cut--here---8<-------
[2]
---cut--here---8<--------8<---cut--here---8<--------8<---cut--here---8<-------
#!/bin/sh
# script to copy changes since first copy
#
find proj -newer startfile -depth -xdev | fgrep -v .snapshot |
                 cpio -pdm newplace
#
# EOF
#
---cut--here---8<--------8<---cut--here---8<--------8<---cut--here---8<-------
[3]
---cut--here---8<--------8<---cut--here---8<--------8<---cut--here---8<-------
#!/bin/sh
# script to remove a the subdirectories of qtree in parallel
#
#              RUN THIS AT YOUR RISK
#
# IF YOU USE THIS SCRIPT AND DELETE *ANYTHING* THAT YOU DO NOT WANT
# TO DELETE YOU ARE ON YOUR OWN. PMC-SIERRA WILL NOT TAKE ANY
# RESPONSIBILITY FOR ANY DAMAGE CAUSED BY THIS SCRIPT. NONE OF
# PMC-SIERRA'S EMPLOYEES WILL TAKE ANY RESPONSIBILITY FOR ANY DAMAGE
# CAUSED BY THIS SCRIPT.
#
#             YOU HAVE BEEN WARNED
#
ECHO=""
if [ ".-test" = ".$1" ]
then
    ECHO=echo 
    shift
fi
if [ $# -ne 1 ]
then
    echo usage: $0 [-test] fullpath
    exit 1
fi
case "$1" in
    /*) :
    ;;
*) echo usage: $0 [-test] fullpath
       exit 1
    ;;
esac
set -e
cd /tmp
cd $1
set +e
if [ -z "$ECHO" ]
then
    echo working on `pwd`
    echo "continue (y/n)?"
    read junk
case "$junk" in
        Y|y|Yes|yEs|yeS|yES|YeS|YEs|YES|yes) 
            :
        ;;
        *) echo aborting
           exit 1
        ;;
    esac
else
    echo
    echo test only on `pwd` 
    echo actions would be as follows:
    echo
fi
for i in * 
do
   if [ -d $i ]
   then
       echo recursive remove on $i
       $ECHO rm -rf $i&
   else
       $ECHO rm -f $i
   fi
done
if [ -z "$ECHO" ]
then
    echo waiting for any background processes to finish
    wait
    echo done $1
fi
cd ..
$ECHO rmdir $1 2>/dev/null || (echo ;echo please check for remaining files)
#
# EOF
#
---cut--here---8<--------8<---cut--here---8<--------8<---cut--here---8<-------
[4]
I do ndmpcopies in parallel, 4 at a time. to coordinate this, i script
it. the actual work call to ndmpcopy is done in the script "moveproject"
as ndmpcopy in a script requires the root password to be put on the
command line, i change the root passwd of the filers to something else.
the script expects a directory "logs" to exist.
i touch start and finish files, as an indicator of elapsed time
and perhaps to use with "find -newer" and cpio if ndmpd fails in a
later level. i really don't trust ndmpcopy.
---cut--here---8<--------8<---cut--here---8<--------8<---cut--here---8<-------
#!/bin/sh
goodhost=m
if [ $# -ne 4 -a $# -ne 5 ]
then 
   echo "usage: $0 src_filer src_project target_filer target_project [level]"
   exit
fi
if [ $# -eq 5 ]
then
    level=$5
else
    level=0
fi
if [ ` hostname` != "$goodhost" ]
then 
    echo i must be run on $goodhost
    exit
fi
remsh $1 ndmpd on
remsh $3 ndmpd on
remsh $3 qtree create $4 2> /dev/null
name=`basename $4`
echo $name, logged to `pwd`/logs/$name.$level/log
mkdir logs/$name.$level
exec > logs/$name.$level/log 2>&1
touch logs/$name.$level/start
./ndmpcopy $1:$2 $3:$4 -level $level -v -sa 'root:PASS' -da 'root:PASS'
touch logs/$name.$level/finish
#
# EOF
#
---cut--here---8<--------8<---cut--here---8<--------8<---cut--here---8<-------
[5]
this is an example script that copies several qtrees from one filer to
another using the moveproject[4] script. this example makes the
following 6 moves:
neutron:/vol/vol0/tools		proton:/vol/vol0/tools
neutron:/vol/vol0/proj/proj1	proton:/vol/vol0/proj1
neutron:/vol/vol0/packages	proton:/vol/vol0/packages
neutron:/vol/vol0/proj/proj2	proton:/vol/vol0/proj2
neutron:/vol/vol0/usr		proton:/vol/vol0/usr
neutron:/vol/vol0/home		proton:/vol/vol0/home
as i only ever run 4  ndmpd copies in parallel, i want to get this
done as effeciently as possible. first i picked the 3 largest qtrees:
home tools and proj1. i start them running in parallel as background
processes.
then i group the remaining 3 and run them one after another, that is,
"home" will not start until "proj2" is done and "proj2" will not
start until "packages" is done. *but* since i use brackets to group them
as a subshell and run that subshell in the background, i end up with 4
ndmp copies running at any point in time.
the final "wait" makes the script hang about until all the background
processes are done. i'm sure you get the idea.
---cut--here---8<--------8<---cut--here---8<--------8<---cut--here---8<-------
#!/bin/sh
# start 3 in parallel
./moveproject neutron /vol/vol0/tools proton /vol/vol0/tools 0&
./moveproject neutron /proj/proj1 proton /vol/vol0/proj1 0&
./moveproject neutron /vol/vol0/home proton /vol/vol0/home 0&
# run remaining one after another, in parallel with first 3
(   ./moveproject neutron /vol/vol0/packages proton /vol/vol0/packages 0
    ./moveproject neutron /proj/proj2 proton /vol/vol0/proj2 0
    ./moveproject neutron /vol/vol0/usr proton /vol/vol0/usr 0
)&
wait
---cut--here---8<--------8<---cut--here---8<--------8<---cut--here---8<-------
[6] yep. 900% usage on the .snapshot filesystem (snap reserve set to
    10%). filer didn't seem to get upset, but maybe it did.
--
email: lance_bailey@pmc-sierra.com    box: Lance R. Bailey, unix Administrator
  vox: +1 604 415 6646                     PMC-Sierra, Inc
  fax: +1 604 415 6151                     105-8555 Baxter Place
 http://www.lydia.org/~zaphod              Burnaby BC, V5A 4V7
186,000 mps: It's not only a good idea, it's the law. -- Frank Wu

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Migration Strategies, what are you doing?