So if less than half of a parity block's data blocks change, it is faster to "update" the parity block. Otherwise it is faster to recalculate the parity block from the new data blocks.
Yup, we call those "parity by subtraction" and "parity by recalculation"; I don't know if we mention that in any white papers, but we do, in fact, we choose one or the other based on which requires fewer disk accesses.
The choice also probably depends on which blocks are currently cached in RAM.
Nope - the RAID code (unless I missed something in my check) doesn't check the WAFL buffer pool to see which blocks are cached but aren't being written.
And if WAFL zeroes out a block whenever it is freed,
It doesn't, so that particular speedup won't work.