I am not sure that this applicable to your problem, but we made very good experiences with the bbcp (Babar Copy:  http://www.slac.stanford.edu/~abh/bbcp/ , http://www.nics.tennessee.edu/computing-resources/data-transfer/bbcp ) program when we migrated our Exchange mailboxes over a WAN links.

 

The BBCP utility is capable of breaking up your transfer into multiple simultaneously transferring streams, thereby transferring data faster than single-streaming utilities”

 

It can work from list but also recursively transfer complete directory trees.

 

Christoph

 

 

 

 

Von: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] Im Auftrag von Arnold de Leon
Gesendet: Freitag, 18. Oktober 2013 08:03
An: toasters
Betreff: Re: Slow copy of a directory full of files via an NFS client across a WAN

 

Thanks everyone for the ideas.  I should have listed some of the other things we already tried. We had played around with generating the list with a find and feeding a parallel copy and got some speed up that way but I was hoping that I was just missing something obvious that was keeping cp/tar/cpio/rsync issuing the next write before the previous one.  

 

The "nocto" option is one I haven't tried but looks interesting.  NFSv4 is not an immediate option (but could be in the long run).  Another option is write a custom copy program that does parallel writes but we need to understand more.  Generating the list is being surprisingly slow (slow enough to exceed my time budget).  I would have expected READDIR/READDIR+ to be a little smarter but this was not being apparent with the tests I have run so far.  If there is way to make the filer do the copy itself that would be ok as well.

 

The client is making a copy of a directory on the same filer.  The filer just happens to be remote.

 

Client <---WAN---> Filer

 

So if there there was directory called Filer:/dir/a the Client wants to make Filer:/dir/a-copy before the contents of "/dir/a" get modified.

 

Thanks again.

 

 arnold

 

 

 

On Thu, Oct 17, 2013 at 10:24 PM, Michael van Elst <mlelstv@serpens.de> wrote:

On Thu, Oct 17, 2013 at 12:56:43PM -0700, Arnold de Leon wrote:

> I have not seen any delays that are on the order of seconds.  What I see is
> a steady stream of 1 requests 1 response.  What I was hoping to see is a
> multiple requests followed by multiple responses (to deal w/ the latency).

That's how your application (cp -a) works, there is no chance to
have multiple outstanding requests when none are requested. For
read/write the system has at least a chance to do read-ahead or
write-behind buffering, but for scanning a directory tree
sequentially no magic is done.

You can change your procedure to first generate a list of files
(using ls -f or possibly find, avoid stat() calls). Split that list
into parts and work on each part in parallel, for example with
cpio or pax, 'tar -T' should work too.

Linux can issue parallel requests to the NFS server, there is a limit
of 16 per mount or per server depending on kernel version, and that
limit is tunable.


Greetings,
--
                                Michael van Elst
Internet: mlelstv@serpens.de
                                "A potential Snark may lurk in every tree."