On Tue, 10 Aug 1999, Stephen C. Losen wrote:
Actually, if you write N data blocks that correspond to the same parity block, you only do N+1 writes for the N blocks.
That is certainly an optimization. In the most basic case you write one block and update the parity block, which is two writes. One can also read all the blocks that add up to the parity block and recheck, but I wonder what that would do to performance. BTW, as I hear, Auspex replaces bad parity with a value calculated from the component blocks. I think that is dangerous.
In WAFL's RAID4, the parity block is the XOR of all the corresponding data blocks. Assuming the old parity block is correct, to calculate the new parity block, you don't need to know the values of all the data blocks. you only need to know the old and new values of the modified data blocks and the old value of the parity block.
If a bit changes in a data block, the corresponding parity bit must change. If the corresponding bit changes in two data blocks, it does not change in the parity block, etc. So you just XOR the old data block values and the new data block values and the old parity block to get the new parity block.
So if less than half of a parity block's data blocks change, it is faster to "update" the parity block. Otherwise it is faster to recalculate the parity block from the new data blocks. The choice also probably depends on which blocks are currently cached in RAM. And if WAFL zeroes out a block whenever it is freed, then whenever WAFL allocates a free block, the old value is known to be all zeroes, so the block does not need to be read from disk. And since I XOR 0 == I, a null block can be omitted from any parity calculations.
I'm no WAFL expert -- just a customer who's read the white papers. But it is clear that there are a lot of tricks that WAFL can employ to make RAID4 parity calculations fast and efficient.
Steve Losen scl@virginia.edu phone: 804-924-0640
University of Virginia ITC Unix Support