Just a comment: race conditions over NFS can be common and severe if you construct a piece of SW (a system) which inherently assumes perfect [client] Cache Coherence. This is not really possible to achieve, and client side caching is VERY important, yes crucial, for performance of any distributed file system like NFS is.
(Try Google for "perfect cache coherence" file system and look at the hits you will get.)
The .nfsXXXX files are residues from such a race conditions in the case that one client had a file open (an active File Handle) and writes to it, and another client just deletes that file. When the W data is flushed from the 1st client, the file is gone and that data goes into the .nfsXXXX file in the dir where client #2 expected the file to be.
It is, unfortunately, quite common that people have totally misunderstood the semantics of UNIX and NFS in this respect. Many really believe that if one NFS client has a file open, then no other client can delete it. So they have no idea what "Stale NFS file handle" means and how easy it is to end up in that situation if you work with a parallel system (home brew as it often is) over NFS with many NFS clients involved. It seems easy and straightforward, but it is not.
There is no mandatory file locking in NFS. Never has been. It's advvisory and also before NFSv4.x auxiliary (the NLM system, with its own ports, it doesn't have very high performance capacity).
If you don't know EXACTLY what you're doing, you will shoot yourself in the foot.
Regards, /M
On 2016-12-28 14:10, andrei.borzenkov@ts.fujitsu.com wrote:
I would expect “Stale NFS handle” if the problem was (another) client caching. But it looks like (another) client actually contacts server and gets “No such file” in response. Multiple resources on Net suggest that it is known NFS limitation.
I can think of at least one case when it is possible – if target file is currently opened on the same client that is doing rename, client is expected to rename target to .nfsXXXX to prevent deletion on server which opens up window when target file is not available.
@Edward, do you see any .nfsXXXX files in the same directory?
*From:*toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] *On Behalf Of *Steiner, Jeffrey *Sent:* Wednesday, December 28, 2016 3:49 PM *To:* Edward Rolison *Cc:* toasters@teaparty.net *Subject:* Re: Atomicity of rename on NFS
That sounds like normal behavior with the typical mount options used for NFS. What are you using exactly? The default includes several seconds of caching of file and directory data. The act of renaming a file is atomic but other NFS clients will not be immediately aware of the change unless you have actimeo=0 and noac in the mount options. There are performance consequences for that but sometimes it's unavoidable. For example, Oracle database clusters using NFS must always have a single consistent image of them data across notes. That's why they use actimeo=0 and noac.
Sent from my mobile phone.
On 28 Dec 2016, at 12:23, Edward Rolison <ed.rolison@gmail.com mailto:ed.rolison@gmail.com> wrote:
Hello fellow NetApp Admins. I have a bit of an odd one that I'm trying to troubleshoot - and whilst I'm not sure it's specifically filer related, it's NFS related (and is happening on a filer mount).
What happens is this - there's a process that updates a file, and relies on 'rename()' being atomic- a journal is updated, and then reference pointer (file) is newly created, and renamed over an old one.
The expectation is that this file will always be there - because "rename()" is defined as an atomic operation.
But that's not quite what I'm getting - I have one nfs client doing it's (atomic) rename. And another client (different NFS host) reading it, and
- occasionally - reporting 'no such file or directory'.
This is causing an operation to fail, which in turn means that someone has to intervene in the process. This operation (and multiple extremely similar ones) happen at 5m intervals, and every few days (once a week maybe?) it fails for this reason, and our developers think that should be impossible. But as such - it looks like a pretty narrow race condition. [...]