79542 10.643148 10.0.0.52 -> 10.0.0.24 NFS 222 ACCESS allowed testfile V3 ACCESS Call, FH: 0x76a9a83d, [Check: RD MD XT XE]
79543 10.643286 10.0.0.24 -> 10.0.0.52 NFS 194 0 ACCESS allowed 0600 Regular File testfile NFS3_OK V3 ACCESS Reply (Call In 79542), [Allowed: RD MD XT XE]
79544 10.643335 10.0.0.52 -> 10.0.0.24 NFS 222 ACCESS allowed V3 ACCESS Call, FH: 0xe0e7db45, [Check: RD LU MD XT DL]
79545 10.643456 10.0.0.24 -> 10.0.0.52 NFS 194 0 ACCESS allowed 0755 Directory NFS3_OK V3 ACCESS Reply (Call In 79544), [Allowed: RD LU MD XT DL]
79546 10.643487 10.0.0.52 -> 10.0.0.24 NFS 230 LOOKUP testfile V3 LOOKUP Call, DH: 0xe0e7db45/testfile
79547 10.643632 10.0.0.24 -> 10.0.0.52 NFS 190 0 LOOKUP 0755 Directory NFS3ERR_NOENT V3 LOOKUP Reply (Call In 79546) Error: NFS3ERR_NOENT
79548 10.643662 10.0.0.52 -> 10.0.0.24 NFS 230 LOOKUP testfile V3 LOOKUP Call, DH: 0xe0e7db45/testfile
79549 10.643814 10.0.0.24 -> 10.0.0.52 NFS 190 0 LOOKUP 0755 Directory NFS3ERR_NOENT V3 LOOKUP Reply (Call In 79548) Error: NFS3ERR_NOENT
203306 13.805489 10.0.0.6 -> 10.0.0.24 NFS 246 LOOKUP .nfs00000000d59701e500001030 V3 LOOKUP Call, DH: 0xe0e7db45/.nfs00000000d59701e500001030
203307 13.805687 10.0.0.24 -> 10.0.0.6 NFS 186 0 LOOKUP 0755 Directory NFS3ERR_NOENT V3 LOOKUP Reply (Call In 203306) Error: NFS3ERR_NOENT
203308 13.805711 10.0.0.6 -> 10.0.0.24 NFS 306 RENAME testfile,.nfs00000000d59701e500001030 V3 RENAME Call, From DH: 0xe0e7db45/testfile To DH: 0xe0e7db45/.nfs00000000d59701e500001030
203309 13.805982 10.0.0.24 -> 10.0.0.6 NFS 330 0,0 RENAME 0755,0755 Directory,Directory NFS3_OK V3 RENAME Reply (Call In 203308)
203310 13.806008 10.0.0.6 -> 10.0.0.24 NFS 294 RENAME testfile_temp,testfile V3 RENAME Call, From DH: 0xe0e7db45/testfile_temp To DH: 0xe0e7db45/testfile
203311 13.806254 10.0.0.24 -> 10.0.0.6 NFS 330 0,0 RENAME 0755,0755 Directory,Directory NFS3_OK V3 RENAME Reply (Call In 203310)
203312 13.806297 10.0.0.6 -> 10.0.0.24 NFS 246 CREATE testfile_temp V3 CREATE Call, DH: 0xe0e7db45/testfile_temp Mode: EXCLUSIVE
203313 13.806538 10.0.0.24 -> 10.0.0.6 NFS 354 0,0 CREATE 0755,0755 Regular File,Directory testfile_temp NFS3_OK V3 CREATE Reply (Call In 203312)
203314 13.806560 10.0.0.6 -> 10.0.0.24 NFS 246 SETATTR 0600 testfile_temp V3 SETATTR Call, FH: 0x4b69a46a
203315 13.806767 10.0.0.24 -> 10.0.0.6 NFS 214 0 SETATTR 0600 Regular File testfile_temp NFS3_OK V3 SETATTR Reply (Call In 203314)
Hello fellow NetApp Admins.
I have a bit of an odd one that I'm trying to troubleshoot - and whilst I'm not sure it's specifically filer related, it's NFS related (and is happening on a filer mount).What happens is this - there's a process that updates a file, and relies on 'rename()' being atomic- a journal is updated, and then reference pointer (file) is newly created, and renamed over an old one.The expectation is that this file will always be there - because "rename()" is defined as an atomic operation.But that's not quite what I'm getting - I have one nfs client doing it's (atomic) rename. And another client (different NFS host) reading it, and - occasionally - reporting 'no such file or directory'.This is causing an operation to fail, which in turn means that someone has to intervene in the process. This operation (and multiple extremely similar ones) happen at 5m intervals, and every few days (once a week maybe?) it fails for this reason, and our developers think that should be impossible. But as such - it looks like a pretty narrow race condition.So what I'm trying to figure out is first off:- Could this be a NetApp bug? We've moved from 7 mode to CDOT, and it didn't happen before. On the flip side though - I have no guarantee that it 'never happened before' because we weren't catching a race condition. (moving to new tin and improving performance does increase race condition likelihood after all)- Could this be a kernel bug? We're all on kernel 2.6.32-504.12.2.el6.x86_64 - and whilst we're deploying Centos 7, all the hosts involved aren't yet. (But that's potentially also just coincidence, as there's quite a few hosts, and they're all the same kernel versions).- Is it actually impossible for a file A renamed over file B to generate ENOENT on a different client? Specifically, in RFC3530 We have: " The RENAME operation must be atomic to the client.". So the client doing the rename sees an atomic operation - but the expectation is that a separate client will also perceive an 'atomic' change - once the cache is refreshed, the 'new' directory has the new files, and at no point was there 'no such file or directory' because it was either the old one, or the newly renamed one. Is this actually a valid thing to think?This is a bit of a complicated one, and has me clutching at straws a bit - I can't reliably reproduce it - a basic fast spinning loop script on multiple client to read-write-rename didn't hit it. I've got pcaps running hoping to catch it 'in flight' - but haven't yet managed to catch it happening. But any suggestions would be gratefully received.