Steve Losen wrote:
I think the point is that I/O does not pass through NVRAM, i.e., NVRAM is not used as a disk cache.
Right. The NetApp NVRAM is *not* a write cache. It's a log.
But when you say that to ppl, just like that statement, they have to understand what the semantic difference between a log and a cache really is in a context such as this. It's pretty strict. Most people don't, not in my experience anyway so confusion results
It is used as a transaction log and in the event of an outage, the NVRAM log is replayed. When a write request arrives, the request is logged to NVRAM and the filer ACKs the request. "At its leisure" the filer commits a batch of write requests to disk by writing a CP (consistency point). Then the corresponding NVRAM log storage is reused to log new requests. When NVRAM approaches capacity, the filer MUST write a CP. Otherwise there is no room in NVRAM to log any more incoming write requests, forcing the filer to reject them.
So the size of NVRAM determines to some degree how much of a backlog of uncommitted write requests that a filer can tolerate.
Good elaboration/description. The NVRAM is a limiting factor for I/O for a whole controller. How fast you can write to disk, is limited by what needs to be done with the buffers temporarily held in the NVRAM, it's essentially a "wall-clock problem" when you hit sustained back-2-back CP
A multi-dimensional optimisation problem. The way it works, proven in the field since so many years, is quite impressive IMO.
The bigger the NVRAM, the more write burst capacity, and the more random-write-efficient the "Tetris Algorithm" becomes (up to some limit I'd guess, I have no idea what that would be). But in a failover cluster the issues with latency for the NVRAM mirroring and the time it takes to do a failover limits the size of the NVRAM so...
My 2 cents is on NetApp to mitigate the issues with this very difficult optimisation problem using some new/improved cleverness around SSD devices. Something perhaps that really works as a write cache, at least partially, in conjunction with the log (NVRAM). We have FlashPool now, and that's good, but does a FlashPool really do anything in real life to help with random writes? (or rather random re-writes)?
/M
Jim Blackburn US Army james.m.blackburn33.civ@mail.mil Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.