-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com]On Behalf Of Steve Losen Sent: Thursday, December 12, 2002 1:37 PM To: Chuck Tomasi Cc: 'Mitko Blazeski'; 'Kumar, Rahul'; 'toasters@mathworks.com' Subject: Re: file folding impact
My question about file folding (after reading about it here and
on NOW)...
Why would a block end up in a snapshot if it is the same as the previous snapshot? Isn't the point of snapshots only to save the 'old
version' of
blocks that have been changed?
Snapshots do not copy any blocks, so including the same block in multiple snapshots incurs no overhead. It works like this. Each data block in a volume has associated with it a bit map of length 32 (20 on older releases) These bits correspond to snapshots. For example, if bit #3 is set, then the block is part of snapshot #3. The filer keeps track of which logical name (eg "hourly.0") is associated with each bit. The size of the bit map limits the number of snapshots that can exist in a volume.
[snip...]
Ok, all of that makes sense to me. But I am still unclear on what filefolding does exactly.
This explanation fits in well with how I think, so if someone could present the basic concept of filefolding in a similar matter, I would very much appreciate it. :)
You had to ask ... :-)
When a client modifies a file, it often does it in a way that replaces the entire file, by writing a whole new file from start to finish. This is particularly true of editors and word processors. Let us suppose that the old file is in a snapshot. When the client writes the new file, it cannot overwrite the original file because its blocks are in a snapshot. So the new file is written to freshly allocated blocks and the old blocks are removed from the live filesystem, but remain in the snapshot of course. If the modification to the file is very small, such as adding a few sentences to the end of a large document, you end up duplicating a lot of data. Up to the point where you added to the end of the document, the blocks in the snapshot and the blocks in the live file contain duplicate data. File folding detects this and "stitches" the old snapshotted blocks back into the live file, and frees the freshly allocated blocks. That way small changes to large snapshotted files don't consume so much disk space.
I don't know how clever or aggressive file folding is -- the more thoroughly it looks for duplicated blocks, the more space it will recover, but the more CPU it will consume. Because snapshots must be preserved intact, you can only fold two blocks that have identical data. If you add a single byte to the beginning of a text file, you completely throw off the original block boundaries, making folding impossible.
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support