Could someone from NetApp list the circumstances under which a Network Appliance server will return the NFS RPC ESTALE error? I'm specifically interested in NFS v2, but a v3 listing would be nice if it's different.
Thanks.
- Dan
Could someone from NetApp list the circumstances under which a Network Appliance server will return the NFS RPC ESTALE error? I'm specifically interested in NFS v2, but a v3 listing would be nice if it's different.
The ESTALE error simply means that the server couldn't find any file for a given file handle.
The most common cause of ESTALE is for a file that is open on one system to be removed on a different system. If the remove comes from the same system, then the NFS client detects that the file is open and renames it to ".nfs####" instead of removing it. But a remove from a different system has no way to detect that the file is open, since NFS is stateless, so the remove succeeds and accesses to the open file suddenly start failing with ESTALE.
Sometimes a vendor will change the file handle format, in which case upgrading to a new software release could case ESTALE for files that were openned before the upgrade. (I don't think we've done that lately, but don't know for sure.)
Dave
Sometimes a vendor will change the file handle format, in which case upgrading to a new software release could case ESTALE for files that were openned before the upgrade. (I don't think we've done that lately, but don't know for sure.)
We've changed the file handle format in 5.0, to add a file system identifier...
...but we added a flag to the file handle to say whether it's an "old-format" or "new-format" file handle, and if the filer receives an "old-format" file handle, it turns it into a "new-format" file handle that refers to the root volume, *and* returns "old-format" file handles in the reply if the reply contains file handles. (That keeps the client from thinking that file XXX, when referred to by an old-format file handle, is a different file from file XXX referred to by a new-format file handle, as UNIX clients typically assign one internal "rnode" data structure per file handle.)
I.e., you don't get ESTALE from upgrading to 5.0 (if you do, it's a bug).
The "new-format" file handle always has a big-endian file ID, which is why the "nfs.big_endianize_fileid" option is a "hidden" option in 5.0 - if you're already setting it, we continue to honor it for old-format file handles, but it has no effect on new-format file handles - and, since you have to remount from the server if you set that flag on the server, and since the 5.0 mount daemon *always* hands out new-format file handles, you can get a file handle with a big-endian file ID by remounting *without* bothering to set the flag.
(It's always nice when you can *remove* a knob from an appliance. Too bad we didn't think of the "flag in the file handle" hack when we first discovered that some UNIX clients hash the file handle, in the rnode lookup code, in such a way that little-endian file IDs cause most if not all rnodes to end up in the same hash bucket....)
The ESTALE error simply means that the server couldn't find any file for a given file handle.
The most common cause of ESTALE is for a file that is open on one system to be removed on a different system. If the remove comes from the same system, then the NFS client detects that the file is open and renames it to ".nfs####" instead of removing it. But a remove from a different system has no way to detect that the file is open, since NFS is stateless, so the remove succeeds and accesses to the open file suddenly start failing with ESTALE.
This is, of course, not NetApp-specific; other NFS servers work the same way.