"Robert" == Robert Johannes rjohanne@damango.net writes:
Robert> Dear Group; I'm running into a situation with the Netapp Robert> filer 740. I have a directory on the netapp called Robert> /www/C_name , and this directory gets mounted on two Robert> webservers. I've been running into a problem where the Robert> webserver randomly denies acess to some files, returning a Robert> "Forbidden" message. The strange thing is the files for Robert> which access is denied have the right permissions: I Robert> looked through the error log file and noticed that foreach Robert> of the files whose access had been denied, the webserver Robert> logged a message saying: (151)Stale NFS file handle: Robert> access to /www/C_name/file_name failed. But when the Robert> webbrowser reloads the requested file a couple of times, Robert> the file is eventually displayed.
Robert> I also looked through the system log files, and there was Robert> no log of any kind mentioning a stale nfs file handle. I Robert> mean that the nfs file system was/is intact, but some how, Robert> the webserver could not access the file.
Robert> What I'm wondering about: Is there a situation that would Robert> cause the netapp filer to deny access to the webserver Robert> machines for a few miliseconds, thus coincidentally Robert> causing the webserver not to serve the requested file? Robert> What would lead to a situation like this?
Robert> Any pointers/suggestions/comments/help very much Robert> appreciated.
Let me guess, Solaris on the NFS client side?
It's a bug in the Solaris rnode cache. Apply this patch to apache 1.3.9 and it will fix the problem:
*** apache_1.3.9.orig/src/main/http_request.c Fri May 21 08:16:21 1999 --- apache_1.3.9/src/main/http_request.c Mon Sep 13 21:02:47 1999 *************** *** 243,248 **** --- 243,253 ---- else { errno = 0; rv = stat(path, &r->finfo); + if (errno == 151) { /* overcome problems with Solaris NFS rnode cache */ + rv = stat(path, &r->finfo); + ap_log_rerror(APLOG_MARK, APLOG_ERR, r, + "retrying access to %s", r->uri); + } }
I've been meaning to report the problem to Sun for ages. The deal is, when Solaris implemented the rnode cache (which caches file names to NFS file handles), they did something stupid. If a file is replaced with a new file on the server (by a different client), then the filehandle for that file changes. When a Solaris NFS client tries to access the file, it uses the file handle it already has cached. If this is invalid, Solaris removes the item from the cache, but it reports a Stale NFS File Handle up to the application. The next time an application tries to access the same file, the new NFS file handle is retrieved and added to the cache. The brokeness is that the OS shouldn't be passing the error up to the application, it should update the item in the cache.
On a non-broken NFS client, the only time you'll get a stale NFS file handle is if an application on one client has a file open and that file is then deleted by another client. At that point, the application does (and should) get an error since it still has the file open.
Under Solaris, even if all applications on a client have closed the file, an application can still get an error the next time it goes to open the file if the item is in the rnode cache has been removed from the server.
j. -- Jay Soffian jay@cimedia.com UNIX Systems Engineer 404.572.1941 Cox Interactive Media