Huge one-time data transfer from UNIX to NetApp

List overview All Threads
Download

newer

older

RE: EMC Celerra vs NetApp Filer

RE: Huge one-time data transfer...

Rainchik, Aleksandr (MED, Non GE)

6 Apr 2000 6 Apr '00

8 p.m.

Hi!

I have a question: what should be the best way to transfer 20-25Gb of data (lot's of small files) from UNIX to NetApp?

Can I do ufsdump on UNIX, pipe it trough rsh to NetApp and do restore there?

Thank you.

Show replies by date

Corris Randall

7 Apr 7 Apr

2:45 p.m.

I used concurrent gnu "cp -a"'s to copy 250GB on four machines in 8 hours.

in your case one machine should be fine. just mount the netapp onto the unix box, and go into the top level directory and type this (in csh or tcsh):

mkdir /mnt/toaster/data cd /usr/data foreach dir (*) cp -a "$dir" /mnt/toaster/data & end

if the directories are even in size that's better. you can also use tar, replace the cp command above with:

tar cf - "$dir" | ( cd /mnt/toaster/data ; tar xfpB -) &

-corris

On Thu, 6 Apr 2000, Rainchik, Aleksandr (MED, Non GE) wrote:

...

Date: Thu, 6 Apr 2000 15:00:34 -0500 From: "Rainchik, Aleksandr (MED, Non GE)" Aleksandr.Rainchik@amermsx.med.ge.com To: toasters@mathworks.com Subject: Huge one-time data transfer from UNIX to NetApp

Hi!

I have a question: what should be the best way to transfer 20-25Gb of data (lot's of small files) from UNIX to NetApp?

Can I do ufsdump on UNIX, pipe it trough rsh to NetApp and do restore there?

Thank you.

Bruce Sterling Woodcock

5:22 p.m.

[cp and tar examples deleted]

Of course, these will not give exact copies of the data you want. You best bet is to use unix dump and restore or ndmpcopy. Barring that, you want to use something like cpio.

Bruce

Barry Lustig

7:05 p.m.

Or pax. To make the image copy:

pax -rw -pe <src> <dest>

Make sure to use a relative path for src (e.g. cd /a/b/c; pax -rw -pe d /toaster).

barry

On Fri, 7 Apr 2000, Bruce Sterling Woodcock wrote:

...

[cp and tar examples deleted]

Of course, these will not give exact copies of the data you want. You best bet is to use unix dump and restore or ndmpcopy. Barring that, you want to use something like cpio.

Bruce

Robert Johannes

7:21 p.m.

Forgive my ignorance on this matter, but I'm really curious how cp and tar modify the data when you use them to copy or transfer data? What about cpio, doesn't it modify data, even a single bit?

Could anyone give details as to how/why this happens?

thanks

robert

Bruce Sterling Woodcock wrote:

...

[cp and tar examples deleted]

Of course, these will not give exact copies of the data you want. You best bet is to use unix dump and restore or ndmpcopy. Barring that, you want to use something like cpio.

Bruce

Bruce Sterling Woodcock

8:41 p.m.

Actually, looking more closely, the cp -a might actually work. I never use it since it's a gnuism and you can't be sure you'll have gnu cp on your machine. In general, those options are not available, so you're left with using -r, which doesn't preserve special files and such.

The tar I'm not sure what's wrong with; I just remember it not working in a particular case in the past.

Both scripts I think suffer from potential file naming problems if those files have a " in them or start with a - and so on.

Bruce

Gordon Keegan

7:19 p.m.

The only things I've had tar fail on were special files (pipes, device files, etc...) and sparse files (eg. pre-allocated database files that are partly or mostly empty.) The sparse files were a few years back and we didn't try gnu tar. It may handle them now (so may the default tar with your os...)

For regular files, we use tar every few days to move data to and from NetApps.

...

The tar I'm not sure what's wrong with; I just remember it not working in a particular case in the past.

Both scripts I think suffer from potential file naming problems if those files have a " in them or start with a - and so on.

Bruce

-- Gordon Keegan, M10-10 Phone: (508) 261-4696 Motorola ING Fax: (508) 261-5757 20 Cabot Boulevard Mansfield, MA 02048 email: lgk011@dma.isg.mot.com

Corris Randall

9:11 p.m.

well, first off, you make sure that the gnu cp is in your path first, or you specify the full path to cp. the other thing to ensure that there are no errors, redirect the output (including stderr) to a file then check the files for errors.

the only files you have to watch for are the root level directories that you're copying, cp -a takes care of the wacky file names based on the fact that it reads the filenames within the program (opendir) rather than on the command line.

-corris

On Fri, 7 Apr 2000, Bruce Sterling Woodcock wrote:

...

Date: Fri, 7 Apr 2000 13:41:10 -0700 From: Bruce Sterling Woodcock sirbruce@ix.netcom.com To: Robert Johannes rjohanne@damango.net Cc: Corris Randall corris@acc.com, "Rainchik, Aleksandr (MED, Non GE)" Aleksandr.Rainchik@amermsx.med.ge.com, toasters@mathworks.com Subject: Re: Huge one-time data transfer from UNIX to NetApp

Actually, looking more closely, the cp -a might actually work. I never use it since it's a gnuism and you can't be sure you'll have gnu cp on your machine. In general, those options are not available, so you're left with using -r, which doesn't preserve special files and such.

The tar I'm not sure what's wrong with; I just remember it not working in a particular case in the past.

Both scripts I think suffer from potential file naming problems if those files have a " in them or start with a - and so on.

Bruce

Bruce Sterling Woodcock

9:29 p.m.

...

the only files you have to watch for are the root level directories that you're copying, cp -a takes care of the wacky file names based on the fact that it reads the filenames within the program (opendir) rather than on the command line.

Yes, but that shell script won't.

Bruce

Corris Randall

9:41 p.m.

it won't what?

mkdir /mnt/toaster/data cd /usr/data foreach dir (*) cp -a "$dir" /mnt/toaster/data & end

this wasn't meant to be a "shell script" per say, it's just meant to be typed into your interactive shell. The only files that can't have a " or a - at the beginning are the directories in /usr/data

-corris

On Fri, 7 Apr 2000, Bruce Sterling Woodcock wrote:

...

Date: Fri, 7 Apr 2000 14:29:53 -0700 From: Bruce Sterling Woodcock sirbruce@ix.netcom.com To: Corris Randall corris@acc.com Cc: Robert Johannes rjohanne@damango.net, "Rainchik, Aleksandr (MED, Non GE)" Aleksandr.Rainchik@amermsx.med.ge.com, toasters@mathworks.com Subject: Re: Huge one-time data transfer from UNIX to NetApp

...
the only files you have to watch for are the root level directories that you're copying, cp -a takes care of the wacky file names based on the fact that it reads the filenames within the program (opendir) rather than on the command line.

Yes, but that shell script won't.

Bruce

Ronan Mullally

9 Apr 9 Apr

7:17 p.m.

On Fri, 7 Apr 2000, Robert Johannes wrote:

...

Forgive my ignorance on this matter, but I'm really curious how cp and tar modify the data when you use them to copy or transfer data? What about cpio, doesn't it modify data, even a single bit?

Could anyone give details as to how/why this happens?

Unix's umask springs to mind -- it's caught me out once or twice on a straight tar | tar copy.

-Ronan

Justin

7 Apr 7 Apr

8:20 p.m.

Bruce, Excuse my ignorance, but why will these not give exact copies? what changes? When necessary use find piped to cpio to shuffle large amount of data between netapps, which you indicate does not have this same problem, but I'm curious as to what the problem actually is.

Justin Acklin

Bruce Sterling Woodcock wrote:

...

[cp and tar examples deleted]

Of course, these will not give exact copies of the data you want. You best bet is to use unix dump and restore or ndmpcopy. Barring that, you want to use something like cpio.

Bruce

Chris Thompson

9 Apr 9 Apr

9:13 p.m.

Aleksandr Rainchik wrote:

...

I have a question: what should be the best way to transfer 20-25Gb of data (lot's of small files) from UNIX to NetApp?

Can I do ufsdump on UNIX, pipe it through rsh to NetApp and do restore there?

The answer is: yes, you can. And despite all the other suggestions, this would be my preferred method, both for performance and for transparency.

My second choice would be ufsdump piped to ufsrestore running on an NFS client.

The formats used by Solaris ufsdump/ufsrestore and ONTAP dump/restore are intended to be compatible. I've had some trouble in the past feeding ONTAP dumps to Solaris ufsrestore (I got spurious errors when restoring certain large files with multiple holes some of which were at odd multiples of 4K), but never with feeding Solaris dumps to ONTAP restore.

The preservation of holes is one of the advantages of using this method. Of course, one has to allow for holes being at 8K granularity on Solaris ufs and 4K granularity in wafl.

Maybe you haven't got any symbolic links to worry about, but if you have:

1. Solaris ufsrestore restores the owner and group of symlinks [this is only a year or two old: other programs descended from BSD dump may well not do this]. It doesn't restore their time stamps - nor can any copying method based on front-door use of NFS to write the files.

2. ONTAP restore restores the times as well as the owner and group!

Oh, and if you use ONTAP restore into a volume with quota control on, the inode counts can end up wrong (symlinks are counted twice). This is bugid 23326. "quota off" then "quota on" will fix it.

Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.

Steve Losen

10 Apr 10 Apr

1:17 p.m.

...

Aleksandr Rainchik wrote:

...
I have a question: what should be the best way to transfer 20-25Gb of data (lot's of small files) from UNIX to NetApp?

Can I do ufsdump on UNIX, pipe it through rsh to NetApp and do restore there?

The answer is: yes, you can. And despite all the other suggestions, this would be my preferred method, both for performance and for transparency.

My second choice would be ufsdump piped to ufsrestore running on an NFS client.

Some folks have mentioned using tar, find|cpio, and cp. Here are the problems with these methods:

tar -- has a 200 character limit on pathnames. You can't copy an arbitrarily deep directory tree or one with very long filenames.

find|cpio -- Doesn't work for filenames with embedded newlines (yes a newline can appear in a filename.) There is also a limit on pathname length, although more than 200 characters. GNU find has -print0 and GNU cpio has -0 to get around the newline problem.

cp -r -- follows symlinks instead of preserving them. Looks like GNU's cp -a solves this problem. I don't know if cp -a preserves special files, however. I also don't know if cp -a has a pathname length limitation. Unix itself limits pathnames to 1024 characters, so if cp does not take care to avoid this limit (by changing directory and using relative pathnames) then cp is unable to copy arbitrarily deep directory trees.

Note the pathname limit in Unix does not prevent you from creating extremely deep directory trees. It's actually quite simple:

i=0 while [ $i -lt 5000 ] do mkdir x cd x i=` expr $i + 1 ` done

So dump|restore is preferable because (I think) it has no pathname length limitations. It preserves symlinks, and special files, and handles any legal filename.

Of course the other methods only fail in unusual circumstances. You can still use them if you check your source directory for problems first.

Steve Losen scl@virginia.edu phone: 804-924-0640

University of Virginia ITC Unix Support

9217

Age (days ago)

9221

Last active (days ago)

toasters@lists.teaparty.net

13 comments

10 participants

tags (0)

participants (10)

Barry Lustig
Bruce Sterling Woodcock
Chris Thompson
Corris Randall
Gordon Keegan
Justin
Rainchik, Aleksandr (MED, Non GE)
Robert Johannes
Ronan Mullally
Steve Losen