Re: Atomicity of rename on NFS

3 Jan 2017

      As a followup on this - I've tracked down the problem, and wanted to say
thanks to all the people offering insight - most of it moved me in the
right direction, which I'm summarising here because it's at least a little
interesting.
It boils down to this - on the reading host, my pcap looks like:
79542  10.643148 10.0.0.52 -> 10.0.0.24 NFS 222  ACCESS allowed
testfile  V3 ACCESS Call, FH: 0x76a9a83d, [Check: RD MD XT XE]
79543  10.643286 10.0.0.24 -> 10.0.0.52 NFS 194 0 ACCESS allowed 0600
Regular File testfile NFS3_OK V3 ACCESS Reply (Call In 79542),
[Allowed: RD MD XT XE]
79544  10.643335 10.0.0.52 -> 10.0.0.24 NFS 222  ACCESS allowed     V3
ACCESS Call, FH: 0xe0e7db45, [Check: RD LU MD XT DL]
79545  10.643456 10.0.0.24 -> 10.0.0.52 NFS 194 0 ACCESS allowed 0755
Directory  NFS3_OK V3 ACCESS Reply (Call In 79544), [Allowed: RD LU MD
XT DL]
79546  10.643487 10.0.0.52 -> 10.0.0.24 NFS 230  LOOKUP    testfile
V3 LOOKUP Call, DH: 0xe0e7db45/testfile
79547  10.643632 10.0.0.24 -> 10.0.0.52 NFS 190 0 LOOKUP  0755
Directory  NFS3ERR_NOENT V3 LOOKUP Reply (Call In 79546) Error:
NFS3ERR_NOENT
79548  10.643662 10.0.0.52 -> 10.0.0.24 NFS 230  LOOKUP    testfile
V3 LOOKUP Call, DH: 0xe0e7db45/testfile
79549  10.643814 10.0.0.24 -> 10.0.0.52 NFS 190 0 LOOKUP  0755
Directory  NFS3ERR_NOENT V3 LOOKUP Reply (Call In 79548) Error:
NFS3ERR_NOENT
On my writing host - I get:
203306  13.805489  10.0.0.6 -> 10.0.0.24 NFS 246  LOOKUP
.nfs00000000d59701e500001030  V3 LOOKUP Call, DH:
0xe0e7db45/.nfs00000000d59701e500001030
203307  13.805687 10.0.0.24 -> 10.0.0.6  NFS 186 0 LOOKUP  0755
Directory  NFS3ERR_NOENT V3 LOOKUP Reply (Call In 203306) Error:
NFS3ERR_NOENT
203308  13.805711  10.0.0.6 -> 10.0.0.24 NFS 306  RENAME
testfile,.nfs00000000d59701e500001030  V3 RENAME Call, From DH:
0xe0e7db45/testfile To DH: 0xe0e7db45/.nfs00000000d59701e500001030
203309  13.805982 10.0.0.24 -> 10.0.0.6  NFS 330 0,0 RENAME  0755,0755
Directory,Directory  NFS3_OK V3 RENAME Reply (Call In 203308)
203310  13.806008  10.0.0.6 -> 10.0.0.24 NFS 294  RENAME
testfile_temp,testfile  V3 RENAME Call, From DH:
0xe0e7db45/testfile_temp To DH: 0xe0e7db45/testfile
203311  13.806254 10.0.0.24 -> 10.0.0.6  NFS 330 0,0 RENAME  0755,0755
Directory,Directory  NFS3_OK V3 RENAME Reply (Call In 203310)
203312  13.806297  10.0.0.6 -> 10.0.0.24 NFS 246  CREATE
testfile_temp  V3 CREATE Call, DH: 0xe0e7db45/testfile_temp Mode:
EXCLUSIVE
203313  13.806538 10.0.0.24 -> 10.0.0.6  NFS 354 0,0 CREATE  0755,0755
Regular File,Directory testfile_temp NFS3_OK V3 CREATE Reply (Call In
203312)
203314  13.806560  10.0.0.6 -> 10.0.0.24 NFS 246  SETATTR  0600
testfile_temp  V3 SETATTR Call, FH: 0x4b69a46a
203315  13.806767 10.0.0.24 -> 10.0.0.6  NFS 214 0 SETATTR  0600
Regular File testfile_temp NFS3_OK V3 SETATTR Reply (Call In 203314)
(IPs modified).
The long and short of it is this - that _most_ of the time, everything
works right, but when the file that's being overwritten (and deleted) has
otherwise been opened for reading by another process - two RENAME
operations occur, because NFS preserves the 'deleted' as part of it's
stateless protocol thing - it has to be valid to open and unlink a file,
and continue to be able to do IO to it, and that's how NFS solves the
problem.
RENAME remains atomic from the client perspective, but you have a teeny
tiny race condition between the two renames, during which the remote client
might get NFS3ERR_NOENT (And return NOENT to the client) because there
isn't a file present.
The reason I had a hard time reproducing this is it simply doesn't happen
in a simplisitic 'single writer' scenario - and doesn't happen often in our
environment, because it _also_ requires the file to be open (for reading)
at the same time. Addition of a pretty simple 'open file; sleep 1;' type
loop was what caused it to occur more reliably/repeatably.
Net result though, is that NFS doesn't offer any guarantees of atomicity of
rename to remote clients - only to the client performing the rename
operation.
On 28 December 2016 at 11:21, Edward Rolison ed.rolison@gmail.com wrote:
...
Hello fellow NetApp Admins.
I have a bit of an odd one that I'm trying to troubleshoot - and whilst
I'm not sure it's specifically filer related, it's NFS related (and is
happening on a filer mount).
What happens is this - there's a process that updates a file, and relies
on 'rename()' being atomic- a journal is updated, and then reference
pointer (file) is newly created, and renamed over an old one.
The expectation is that this file will always be there - because
"rename()" is defined as an atomic operation.
But that's not quite what I'm getting - I have one nfs client doing it's
(atomic) rename. And another client (different NFS host) reading it, and -
occasionally - reporting 'no such file or directory'.
This is causing an operation to fail, which in turn means that someone has
to intervene in the process. This operation (and multiple extremely similar
ones) happen at 5m intervals, and every few days (once a week maybe?) it
fails for this reason, and our developers think that should be impossible.
But as such - it looks like a pretty narrow race condition.
So what I'm trying to figure out is first off:

Could this be a NetApp bug? We've moved from 7 mode to CDOT, and it

didn't happen before. On the flip side though - I have no guarantee that it
'never happened before' because we weren't catching a race condition.
(moving to new tin and improving performance does increase race condition
likelihood after all)

Could this be a kernel bug? We're all on kernel 2.6.32-504.12.2.el6.

x86_64  - and whilst we're deploying Centos 7, all the hosts involved
aren't yet. (But that's potentially also just coincidence, as there's quite
a few hosts, and they're all the same kernel versions).

Is it actually impossible for a file A renamed over file B to generate

ENOENT on a different client? Specifically, in RFC3530 We have: " The
RENAME operation must be atomic to the client.". So the client doing the
rename sees an atomic operation - but the expectation is that a separate
client will also perceive an 'atomic' change - once the cache is refreshed,
the 'new' directory has the new files, and at no point was there 'no such
file or directory' because it was either the old one, or the newly renamed
one. Is this actually a valid thing to think?
This is a bit of a complicated one, and has me clutching at straws a bit -
I can't reliably reproduce it - a basic fast spinning loop script on
multiple client to read-write-rename didn't hit it. I've got pcaps running
hoping to catch it 'in flight' - but haven't yet managed to catch it
happening. But any suggestions would be gratefully received.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Atomicity of rename on NFS