If you have operations manager (DFM, or whatever it’s called at the moment) you can graph the stats over time. I did this for a group of our filers to see what
the stats looked like before/after the change was made. Specifically I graphed:
Cache hit %
Hit, hit_percent & miss
Disk Reads Replaced
Insert/Evicts/Invalidates
I let those run for a couple of weeks to get a baseline and then enabled lopri_blocks and basically watched what happened. For us (NFS, heavy metadata workloads)
I saw the ‘churn’ of the cache go up but we also started getting much higher hit rates.
--rdp
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]
On Behalf Of Edward Rolison
Sent: Tuesday, March 17, 2015 12:54 PM
To: toasters@teaparty.net
Subject: Flexscale lopri_blocks
I'm currently contemplating caching on my filers. Specifically - setting those 3 flags on a flexscale cache appropriately.
I _think_ given the workload I'm putting through one of my set of filers, enabling lopri_blocks is beneficial
When I do, a 'stats show -p flexscale-access' gives me more disk reads replaced, and a better cache hit ratio.
However I'm also looking at wafl:wafl:read_io_type and seeing that mostly - my cache module is only servicing about 10% of the IOPs going through my system anyway. Of the remainder, it's a pretty even split between disk/cache.
And with 'lopri_blocks' set to on, my cache is churning data faster - but with it off, hit rate drops to <10%.
I'm wondering if anyone can offer suggestions as to a good way to compare the two states - filer side. Are there any metrics I _should_ be looking at, that I'm not?
Primarily looking at:
- total IOPs (there's only one volume on this filer)
- response time.
- Disk utilisation (definitely increases when lopri_blocks is off).
- cache stats as above.
CP types - mostly this system is doing log_full CPs, and is getting a fairly steady stream of RW IOPs. Most IOs are NFS getattr and lookups though, which are mostly metadata. Of 100K iops, it's about 90% non-RW and 5% read, 5% write.
I think I'm right in saying that these metadata IOs are going to be fairly small and thus served out of RAM (mostly) anyway.
I'm also (re) reading TR-3832 - Flash Cache best practice.
This is a little more vague than I'd like on figuring out whether 'lopri_blocks' will actually be useful.
I _think_ they are based on a better hit rate, and a better reads-replaced rate. Any read replaced is one that's served from a faster tier, and is therefore 'better' objectively, right?