Ok, so the behavior acts like this:
1) On client A, which had the filesystem mounted before the ndmpcopy, the new data does not show up in directory X.
2) On client B, which mounts after the ndmpcopy starts, the data shows up.
3) On client A, after you unmount and remount, the data shows up.
I have a theory based on the nfs protocol which might explain the behavior. Have your flamers ready because this could be wrong wrong wrong. Keep in mind that I've never actually read any nfs client source code...so set your expectations accordingly...
When you do a ls on a directory you've never accessed before from a client, you get the data block for the directory and put it in your buffer cache.
Later, if somebody does an ls on the same directory on the nfs client, the client needs to see if the buffer cache's data is valid. It does this by executing a getattr() nfs call. If the timestamp on the server has not changed, then the client will think that it's buffer cache is valid and not actually get the directory data.
(this is why latex takes so long in nfsv2 even if the whole thing is in your buffer cache, depending on how much latex stuff you have you might have to do 10,000 getattr() calls to validate your buffer caches-- whereas if you're running on local disk you _know_ the buffer caches are valid).
So, if the ndmpcopy is not updating the timestamp on that directory until the end, the client will happily believe that the directory's contents (as listed incorrectly in the clients buffer cache) is correct.
But if you unmount and remount, that invalidates that part of the buffer cache, causing the data to magically appear.
And if you mount on another client after the ndmpcopy is in progress, you don't have an incorrect buffer cache entry.
This theory explains the behavior. Here are experiments to increase confidence in the theory:
1) packet trace between the server and the client, to watch for the getattr(), but not the actual read of that directory.
2) Manually create a file in the directory during the ndmpcopy from a second client (which causes the directory timestamp to update, which will mean when the first client does it's buffer cache validation with getattr() it will realize that it's buffer cache is out-of-date.
3) Check the timestamp on the directory from client 1 before the ndmpcopy, and from both clients during the ndmpcopy.
Of course, this isn't important enough for me to try these experiments. If this is true, then it's a simple netapp bug which can be fixed by updating the timestamp on the target directory at the beginning of the ndmpcopy.
Of course, if it's wrong then it's wrong :-)
Darrell Root rootd@nas.nasa.gov
On Wed, 22 Jul 1998, Darrell Root wrote:
Of course, this isn't important enough for me to try these experiments. If this is true, then it's a simple netapp bug which can be fixed by updating the timestamp on the target directory at the beginning of the ndmpcopy.
Actually, it would have to do it all the way along, every time the directory was updated. Doing it at the beginning wouldn't help much, as as soon as data started flowing, it would be out again.
OK. I had a couple of quick points to add.
1) Maybe it isn't such a _bad_ thing not to be able to see the data from the machines on which it's mounted. After all, restore would not appreciate somebody modifying the file system it is working on filling in. It won't crash your machine on anything like that, but still... Also, since directory permissions aren't restored until the very end of a restore, one could see that as a serious security hole, as well.
2) Restore should really do a bit more than sit there, and maybe ask for a new tape every once in a while. There should definitely be some hints that it's working for you.
3) If you have problems like this, do you guys call NetApp and ask for a bug to be filed. If there are bugs with actual customer call records, those are (obviously) treated as being more serious than, say, bugs that I file. So, I have filed a bug on restore being too quiet. It is filed as 9677, but it will take some time before it is visible on NOW.
Regardless, you should open a call record on this bug to make it more visible to the powers that be.
In the future, if you have problems with dump and restore, it would be great if you made sure tech support opened bugs on them. I try to listen on toasters, but I miss things... And please don't mail me asking me to file a bug. I'm not volunteering for that at all. They already make me do the windows in addition to dump and restore.
4) Why doesn't restore lay down some framework when it first starts? In the classic BSD dump scenario, this is how things work (ufsdump, dump on Linux, etc)
Restore reads the directories from tape into one huge file. It uses this create a desiccated filesystem. That is to say, it tracks the offsets of the beginning of each directory in this file. So, when a user asks for a file, it can execute its own namei, without ever laying this directory structure on the FS. This saves quite a bit of space.
Aside: [Of course, you ask, why do this? Remember that restore -x and restore -i on non-Netapp systems) only extract part of the data on tape. Thus, to make the whole directory structure on the system is a bad idea, especially on a system light on disk space.]
Then, it calculates which files need to be extracted. Finally, it begins to lay the directories and files on the system. But to get to this point takes quite a bit of time. After the directories and files have been written to tape, the system begins to restore the directories' permissions and times [we couldn't do this when creating the directories, since creating the files _might have failed_ due to permission problems and _definitely_ would have affected the times.]
Note, none of this proprietary. It's all in the freely available BSD code. (Just making sure marketing doesn't yell at me again. ;)
So, there are ways to improve on this scheme, and we're working on those. But for now, the reason you don't immediately see lots of data on the system is that the directory data is all in one huge temporary file.
OK. Back to those windows.
Stephen Manley The Restore-Man Formerly the Dump-Man