If you have operations manager (DFM, or whatever it’s called at the moment) you can graph the stats over time. I did this for a group of our filers to see what the stats looked like before/after the change was made. Specifically I graphed:

 

Cache hit %

Hit, hit_percent & miss

Disk Reads Replaced

Insert/Evicts/Invalidates

 

I let those run for a couple of weeks to get a baseline and then enabled lopri_blocks and basically watched what happened. For us (NFS, heavy metadata workloads) I saw the ‘churn’ of the cache go up but we also started getting much higher hit rates.

 

--rdp

 

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Edward Rolison
Sent: Tuesday, March 17, 2015 12:54 PM
To: toasters@teaparty.net
Subject: Flexscale lopri_blocks

 

I'm currently contemplating caching on my filers. Specifically - setting those 3 flags on a flexscale cache appropriately.


I _think_ given the workload I'm putting through one of my set of filers, enabling lopri_blocks is beneficial

 

When I do, a 'stats show -p flexscale-access' gives me more disk reads replaced, and a better cache hit ratio.

 

However I'm also looking at wafl:wafl:read_io_type and seeing that mostly - my cache module is only servicing about 10% of the IOPs going through my system anyway. Of the remainder, it's a pretty even split between disk/cache. 

 

And with 'lopri_blocks' set to on, my cache is churning data faster - but with it off, hit rate drops to <10%. 

 

I'm wondering if anyone can offer suggestions as to a good way to compare the two states - filer side. Are there any metrics I _should_ be looking at, that I'm not?

Primarily looking at:

- total IOPs (there's only one volume on this filer)

- response time. 
- Disk utilisation (definitely increases when lopri_blocks is off). 
- cache stats as above.

CP types - mostly this system is doing log_full CPs, and is getting a fairly steady stream of RW IOPs. Most IOs are NFS getattr and lookups though, which are mostly metadata. Of 100K iops, it's about 90% non-RW and 5% read, 5% write. 

 

I think I'm right in saying that these metadata IOs are going to be fairly small and thus served out of RAM (mostly) anyway. 

I'm also (re) reading TR-3832 - Flash Cache best practice. 
This is a little more vague than I'd like on figuring out whether 'lopri_blocks' will actually be useful. 

I _think_ they are based on a better hit rate, and a better reads-replaced rate. Any read replaced is one that's served from a faster tier, and is therefore 'better' objectively, right?