New subject: stale NFS file handles

13 Apr 1998


      hitz@netapp.com (Dave Hitz) writes:
...
The ESTALE error simply means that the server couldn't find any file
for a given file handle.
If a client writes to a file, but the inode number does not change,
would a different client receive ESTALE or a different error?
Anyway, the reason I'm asking all this is because I'm at a loss to
explain the following sequence of events.  First, a daemon "xx" failed
as follows:
Sat Apr 11 00:00:48 1998 xx: (INFO) Removing old directories
  Sat Apr 11 00:00:49 1998 xx: (INFO) Done removing old directories
  Sat Apr 11 00:00:49 1998 xx: (INFO) Daemon is using 7024 Kb (limit is 7000) - restarting
  Sat Apr 11 00:00:49 1998 xx: (INFO) next instruction is execv(...self...)
  Sat Apr 11 00:00:49 1998 xx: (SEVERE ERROR) Execv failedStale NFS file handle
  Sat Apr 11 00:00:49 1998 xx: (INFO) Dropping core in directory /xxx/yyy
The Linux kernel on that host also logged the error, showing:
Apr 11 00:00:49 kernel: nfs_revalidate_inode: bin/xx getattr failed, ino=4072292, error=-116
Unfortunately, this version of the kernel doesn't show the "before"
and "after" NFS filehandles, but the server did return ESTALE
(error=-116 in Linux) according to the kernel.  (The NetApp server
logs don't show anything intereresting around this time.)
The weird part is that all three timestamps for "xx" precede April 11
00:00:49 by a wide margin.
$ ls -ali /foo/bar/bin/xx*
  4072292 -rwx------   [...]  Apr 10 13:11 /foo/bar/bin/xx
  1161604 -rwx------   [...]  Apr 10 11:22 /foo/bar/bin/xx.old
Here is the code sequence that was running on the "xx" daemon when it
received ESTALE when it tried to re-execute itself (it had been
running since April 8).
------- start of cut text --------------
void re_exec_daemon(void)
{
    char buf[TMP_BUF_SIZE];
disconnect();
if (close(Admin_sock) == -1) {
    sprintf(buf, "Can't close admin socket (%s)", strerror(errno));
    log_mesg(ERROR, buf);
    }
if (close(User_sock) == -1) {
    sprintf(buf, "Can't close user socket (%s)", strerror(errno));
    log_mesg(ERROR, buf);
    }
/* jump to directory we started from */
    if (chdir(Invoke_dir) == -1) {
    sprintf(buf, "Can't change to directory %s", Invoke_dir);
    fatal_error(buf, strerror(errno));
    }
/* make sure sockets are closed */
    (void) close(Admin_sock);
    (void) close(User_sock);
log_mesg(INFO, "next instruction is execv(...self...)");
/* Re-execute us */
    execv(Invoke_command[0], Invoke_command);
/* should not get here */
    fatal_error("Execv failed", strerror(errno));
}
------- end ----------------------------
Dan

Re: stale NFS file handles