We have a solaris 2.6 server with MANY users, that is getting regular "NFS stale file handle" errors. We have the file handle from /var/adm/messages but we do not know which process is generating the error.
Is there anything we can do to get a file name from the file handle, to help identify which process is the source? Rebooting this machine is a last resort option since it is a primary user server (meaning we'd have to do it in the wee hours)
thanks Betsy (new to the list, hello all!)
We have a solaris 2.6 server with MANY users, that is getting regular "NFS stale file handle" errors. We have the file handle from /var/adm/messages but we do not know which process is generating the error.
Is there anything we can do to get a file name from the file handle, to help identify which process is the source? Rebooting this machine is a last resort option since it is a primary user server (meaning we'd have to do it in the wee hours)
thanks Betsy (new to the list, hello all!)
We have discovered that if a user has ksh processes running on two different NFS clients and the ksh processes are using the same NFS mounted history file, that this usually results in "stale NFS handle" messages when one of the ksh processes exits.
This is a fairly common occurrence for us because we have a cluster of unix servers mounting home directories from a netapp. If a user logs in to two different unix boxes he usually gets the same history file in both login sessions. So we have two ksh processes on two different NFS clients banging on the same history file.
We've only seen this behavior with ksh. It also happens when the NFS server is not a netapp.
Steve Losen scl@virginia.edu phone: 804-924-0640
University of Virginia ITC Unix Support
OOh, I went through this headache. Hard to get info. If you check the archives for two or three months ago I might of explained how I finally figured it out (I don't remember now). You have to take the 4th or 5th number from the error number I think and convert it from hex to decimal, and that'll give you the inode. Now one is a 16 bit file inode which you can't ever find that inode on the system because that's a netapps inode, but what you CAN find in that string of numbers is the inode for the directory where the file was, which would be on an mounted partition. That's the closest I got, but it was enough.
I know the docs off of netapps were wrong and I had a discussion here with someone from netapps about that (helpful people), and I had also tried reading sunworld online's "errno libretto" document, both which were wrong but gave me leads of where to persue this. Search the archives at :
http://teaparty.mathworks.com:1999/toasters/
Sorry can't be much more helpful...
On Fri, 23 Jul 1999, Elizabeth Schwartz wrote:
We have a solaris 2.6 server with MANY users, that is getting regular "NFS stale file handle" errors. We have the file handle from /var/adm/messages but we do not know which process is generating the error.
Is there anything we can do to get a file name from the file handle, to help identify which process is the source? Rebooting this machine is a last resort option since it is a primary user server (meaning we'd have to do it in the wee hours)
thanks Betsy (new to the list, hello all!)
----------- Fujitsu - Nexion, St. Louis, MO Jay Orr (314) 579-6517
On Fri, 23 Jul 1999, Jay Orr wrote:
Ooops -- I knew I was forgetting things. See :
http://teaparty.mathworks.com:1999/toasters/3075.html
OOh, I went through this headache. Hard to get info. If you check the archives for two or three months ago I might of explained how I finally figured it out (I don't remember now). You have to take the 4th or 5th number from the error number I think and convert it from hex to decimal, and that'll give you the inode. Now one is a 16 bit file inode which you can't ever find that inode on the system because that's a netapps inode, but what you CAN find in that string of numbers is the inode for the directory where the file was, which would be on an mounted partition. That's the closest I got, but it was enough.
I know the docs off of netapps were wrong and I had a discussion here with someone from netapps about that (helpful people), and I had also tried reading sunworld online's "errno libretto" document, both which were wrong but gave me leads of where to persue this. Search the archives at :
http://teaparty.mathworks.com:1999/toasters/
Sorry can't be much more helpful...
On Fri, 23 Jul 1999, Elizabeth Schwartz wrote:
We have a solaris 2.6 server with MANY users, that is getting regular "NFS stale file handle" errors. We have the file handle from /var/adm/messages but we do not know which process is generating the error.
Is there anything we can do to get a file name from the file handle, to help identify which process is the source? Rebooting this machine is a last resort option since it is a primary user server (meaning we'd have to do it in the wee hours)
thanks Betsy (new to the list, hello all!)
Fujitsu - Nexion, St. Louis, MO Jay Orr (314) 579-6517
----------- Fujitsu - Nexion, St. Louis, MO Jay Orr (314) 579-6517
Thanks very much for the pointers! Now that we know how to get the inode from the file handle, we know how to find the file :-)
To recap - and to see if I have this straight:
the first 32bits of the file handle are the inode number, but it's other-endian, so A0D10C00 -> 0xCD1A0 -> 840096
thanks Betsy
At 03:22 PM 07/23/1999 -0500, Jay Orr wrote:
The 5.0 description doesn't have the right number of bytes. I got eight octets. Is this description missing the 16 bits for flags perhaps? Also, the field we want is "File ID for file" right? a0d10c00 a07c0820 20000000 35d948 84709009 e8250000 a0d10c00 a07c0800)
Guy Harris (guy@netapp.com) who gave me the info might be able to help you better.
I think I did the shotgun approach and converted the numbers until I found one that matched, and that was an inode for a directory where the problem was, and I could tell from there what was wrong.
Forgive me, it's been a long week and my brain is on shutdown mode.... If no one else can help you with this I'll try to find my notes on Monday and give you a clear-minded answer, but I by no means claim to be an authority on this, just that I've run into it...
On Fri, 23 Jul 1999, Elizabeth Schwartz wrote:
At 03:22 PM 07/23/1999 -0500, Jay Orr wrote:
The 5.0 description doesn't have the right number of bytes. I got eight octets. Is this description missing the 16 bits for flags perhaps? Also, the field we want is "File ID for file" right? a0d10c00 a07c0820 20000000 35d948 84709009 e8250000 a0d10c00 a07c0800)
----------- Fujitsu - Nexion, St. Louis, MO Jay Orr (314) 579-6517