>>The client is making a copy of a directory on the same filer. The filer just happens to be remote.
If the directory is on the same filer, use ndmpcopy. Or better yet…why are you making this copy. It sounds like what you are trying to accomplish with the
/dir/a-copy could be accomplished with a snapshot.
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]
On Behalf Of Arnold de Leon
Sent: Friday, October 18, 2013 2:03 AM
To: toasters
Subject: Re: Slow copy of a directory full of files via an NFS client across a WAN
Thanks everyone for the ideas. I should have listed some of the other things we already tried. We had played around with generating the list with a find and feeding a parallel copy and got some speed up that way but I was hoping that I
was just missing something obvious that was keeping cp/tar/cpio/rsync issuing the next write before the previous one.
The "nocto" option is one I haven't tried but looks interesting. NFSv4 is not an immediate option (but could be in the long run). Another option is write a custom copy program that does parallel writes but we need to understand more.
Generating the list is being surprisingly slow (slow enough to exceed my time budget). I would have expected READDIR/READDIR+ to be a little smarter but this was not being apparent with the tests I have run so far. If there is way to make the filer do the
copy itself that would be ok as well.
The client is making a copy of a directory on the same filer. The filer just happens to be remote.
Client <---WAN---> Filer
So if there there was directory called Filer:/dir/a the Client wants to make Filer:/dir/a-copy before the contents of "/dir/a" get modified.
Thanks again.
arnold
On Thu, Oct 17, 2013 at 10:24 PM, Michael van Elst <mlelstv@serpens.de> wrote:
On Thu, Oct 17, 2013 at 12:56:43PM -0700, Arnold de Leon wrote:
> I have not seen any delays that are on the order of seconds. What I see is
> a steady stream of 1 requests 1 response. What I was hoping to see is a
> multiple requests followed by multiple responses (to deal w/ the latency).
That's how your application (cp -a) works, there is no chance to
have multiple outstanding requests when none are requested. For
read/write the system has at least a chance to do read-ahead or
write-behind buffering, but for scanning a directory tree
sequentially no magic is done.
You can change your procedure to first generate a list of files
(using ls -f or possibly find, avoid stat() calls). Split that list
into parts and work on each part in parallel, for example with
cpio or pax, 'tar -T' should work too.
Linux can issue parallel requests to the NFS server, there is a limit
of 16 per mount or per server depending on kernel version, and that
limit is tunable.
Greetings,
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."