I am not sure that this applicable to your problem, but we made very good experiences with the bbcp (Babar Copy: http://www.slac.stanford.edu/~abh/bbcp/
,
http://www.nics.tennessee.edu/computing-resources/data-transfer/bbcp ) program when we migrated our Exchange mailboxes over a WAN links.
“The BBCP utility is capable of breaking up your transfer into multiple simultaneously
transferring streams, thereby transferring data faster than single-streaming utilities”
It can work from list but also recursively transfer complete directory trees.
Christoph
Von: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]
Im Auftrag von Arnold de Leon
Gesendet: Freitag, 18. Oktober 2013 08:03
An: toasters
Betreff: Re: Slow copy of a directory full of files via an NFS client across a WAN
Thanks everyone for the ideas. I should have listed some of the other things we already tried. We had played around with generating the list with a find and feeding a parallel copy and got some speed up that
way but I was hoping that I was just missing something obvious that was keeping cp/tar/cpio/rsync issuing the next write before the previous one.
The "nocto" option is one I haven't tried but looks interesting. NFSv4 is not an immediate option (but could be in the long run). Another option is write a custom copy program that does parallel writes but we
need to understand more. Generating the list is being surprisingly slow (slow enough to exceed my time budget). I would have expected READDIR/READDIR+ to be a little smarter but this was not being apparent with the tests I have run so far. If there is way
to make the filer do the copy itself that would be ok as well.
The client is making a copy of a directory on the same filer. The filer just happens to be remote.
Client <---WAN---> Filer
So if there there was directory called Filer:/dir/a the Client wants to make Filer:/dir/a-copy before the contents of "/dir/a" get modified.
Thanks again.
arnold
On Thu, Oct 17, 2013 at 10:24 PM, Michael van Elst <mlelstv@serpens.de> wrote:
On Thu, Oct 17, 2013 at 12:56:43PM -0700, Arnold de Leon wrote:
> I have not seen any delays that are on the order of seconds. What I see is
> a steady stream of 1 requests 1 response. What I was hoping to see is a
> multiple requests followed by multiple responses (to deal w/ the latency).
That's how your application (cp -a) works, there is no chance to
have multiple outstanding requests when none are requested. For
read/write the system has at least a chance to do read-ahead or
write-behind buffering, but for scanning a directory tree
sequentially no magic is done.
You can change your procedure to first generate a list of files
(using ls -f or possibly find, avoid stat() calls). Split that list
into parts and work on each part in parallel, for example with
cpio or pax, 'tar -T' should work too.
Linux can issue parallel requests to the NFS server, there is a limit
of 16 per mount or per server depending on kernel version, and that
limit is tunable.
Greetings,
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."