My question about file folding (after reading about it here and on NOW)...
Why would a block end up in a snapshot if it is the same as the previous snapshot? Isn't the point of snapshots only to save the 'old version' of blocks that have been changed?
Snapshots do not copy any blocks, so including the same block in multiple snapshots incurs no overhead. It works like this. Each data block in a volume has associated with it a bit map of length 32 (20 on older releases) These bits correspond to snapshots. For example, if bit #3 is set, then the block is part of snapshot #3. The filer keeps track of which logical name (eg "hourly.0") is associated with each bit. The size of the bit map limits the number of snapshots that can exist in a volume.
Here's what happens when you create a snapshot. The filer picks an unused snapshot number and sets the corresponding bit for all blocks currently in use by the live filesystem. Note that many of these blocks may have had other bits set when other snapshots were created previously. So a block can be a member of multiple snapshots. It may even be a member of all snapshots. This is not unusual at all. If a file is older than your oldest snapshot, then the blocks that comprise the file are members of all snapshots.
When a snapshot is deleted, the bit corresponding to the snapshot is cleared for each block that has the bit set. If a block is left with no bits set and if the block is not part of the live filesystem, then it is freed for reuse.
So long as a block is a member of any snapshot, it cannot be modified, so it cannot be freed for reuse. The only way to get that block back is to delete all the snapshots that it belongs to.
You may wonder how you can possibly modify a file once it has been snapshotted because you can't modify any of its blocks. The filer simply makes changes in new data blocks and links them into the file in place of the snapshotted blocks. The snapshotted blocks are left untouched. However, they are no longer part of the live filesystem.
So when you look at a snapshot of a file, you are not looking at a copy. You are looking at the actual data blocks that comprised the file when the snapshot was made. In fact, the entire volume is treated this way including the directories, inodes, etc. So when you look at a snapshot, you are looking at the actual blocks that comprised the volume at the moment the snapshot was taken. File permissions, owner, group, timestamps, etc., are all frozen in time. (This can be a security issue. If a sensitive file has been left world readable and you change the permissions to protect it, you have to remember that any snapshotted copy of the file still has the wrong permissions, so anyone can still read the file in the snapshot. Your only option is to delete all snapshots where the file is world readable.)
OK, second question, does it only check the most recent snapshot or does it drill through all existing snaps?
Don't know.
Now that I think more on the subject, third question... what happens to the snapshot "n" that references blocks in snap "n+1" when "n+1" rolls off the end and expires?
Don't think about what happens to the snapshot, think about what happens to the blocks. If a block is a member of snapshots n and n+1, then the block has at least two bits set (n and n+1). When n+1 expires, bit n+1 is cleared for all blocks where it is set. Since bit n is still set, the block is not freed, leaving snapshot n still intact.
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support