Flash Cache vs Flash Pool - toasters

22 Nov 2016


      Hello Toasters
I'm thinking about if/how to implement Flash Pool in our new cDOT
clusters, and was wondering if anyone could provide a bit of real world
guidance for me.
We have two SAS aggregates, aggr_sas_600g_c1n1 with 8x24 600g and
aggr_sas_1200g_c1n1 with 4x24 1.2t.
I've been doing a bunch of reading about Flash Pool vs Flash Cache and am
trying to better understand their strengths and weaknesses.  Flash Pool
accelerates writes as well as reads (Flash Cache is reads only), however
with Flash Pool there seems to be the potential for slower cache
access/throughput vs Flash Cache since the data needs to travel the SAS
path vs Flash Cache which is probably DMA through PCIe.  Maybe that's not
a concern at all, I don't know.  Additionally, it appears that using Flash
Pool disables the Flash Cache functionality for the aggregates which are
in hybrid mode (makes sense), but then we have expensive add-in cards
doing nothing.
Our theoretical Flash Pool would be 2x24 200g, giving us about 5.5t of
usable caching space to sprinkle into these aggregates.
I've been running AWA on the cluster against those two SAS aggregates for
~24 hours and have come up with these stats:
### FP AWA Stats ###
Host mrk_c1n1                    Memory 61054 MB
            ONTAP Version NetApp Release 8.3.2P5: Tue Aug 23 01:27:00 PDT
2016
Basic Information
                Aggregate aggr_sas_600g_c1n1
             Current-time Tue Nov 22 17:06:53 EST 2016
               Start-time Mon Nov 21 12:48:17 EST 2016
      Total runtime (sec) 101918
    Interval length (sec) 600
          Total intervals 157
        In-core Intervals 1024
Summary of the past 157 intervals
                                max
                                ------------
        Read Throughput (MB/s): 339.039
       Write Throughput (MB/s): 123.536
            Cacheable Read (%): 56
           Cacheable Write (%): 66
Max Projected Cache Size (GiB): 787.463
Summary Cache Hit Rate vs. Cache Size
Referenced Cache Size (GiB): 714.650
Referenced Interval: ID 132 starting at Tue Nov 22 12:33:05 EST 2016
         Size        20%        40%        60%        80%       100%
 Read Hit (%)         25         30         30         30         33
Write Hit (%)          1          2          2          2          2
The entire results and output of Automated Workload Analyzer (AWA) are
estimates. The format, syntax, CLI, results and output of AWA may
change in future Data ONTAP releases. AWA reports the projected cache
size in capacity. It does not make recommendations regarding the
number of data SSDs required. Please follow the guidelines for
configuring and deploying Flash Pool; that are provided in tools and
collateral documents. These include verifying the platform cache size
maximums and minimum number and maximum number of data SSDs.
Basic Information
                Aggregate aggr_sas_1200g_c1n1
             Current-time Tue Nov 22 17:06:53 EST 2016
               Start-time Mon Nov 21 12:40:57 EST 2016
      Total runtime (sec) 102357
    Interval length (sec) 600
          Total intervals 158
        In-core Intervals 1024
Summary of the past 158 intervals
                                max
                                ------------
        Read Throughput (MB/s): 914.247
       Write Throughput (MB/s): 257.318
            Cacheable Read (%): 41
           Cacheable Write (%): 26
Max Projected Cache Size (GiB): 2412.178
Summary Cache Hit Rate vs. Cache Size
Referenced Cache Size (GiB): 2142.380
Referenced Interval: ID 113 starting at Tue Nov 22 09:04:41 EST 2016
         Size        20%        40%        60%        80%       100%
 Read Hit (%)         34         38         38         38         41
Write Hit (%)          7          7          7          7          9
The entire results and output of Automated Workload Analyzer (AWA) are
estimates. The format, syntax, CLI, results and output of AWA may
change in future Data ONTAP releases. AWA reports the projected cache
size in capacity. It does not make recommendations regarding the
number of data SSDs required. Please follow the guidelines for
configuring and deploying Flash Pool; that are provided in tools and
collateral documents. These include verifying the platform cache size
maximums and minimum number and maximum number of data SSDs.
### FP AWA Stats End ###
Aggregate aggr_sas_600g_c1n1 has a lot of random overwrites (66%) that
could have been cached.  The volumes in that aggregate are pretty much
exclusively Oracle databases.  The other aggregate, aggr_sas_1200g_c1n1,
doesn't seem hit as hard.
Given those statistics, what would you do if your options were ~5.5t of
Flash Pool vs buying another 2t Flash Cache card per node in this HA pair?
 I seem to be missing the 'Projected Read Offload' and 'Projected Write
Offload' statistics which would have been very useful, mentioned at the
Flash Pool documentation in
https://library.netapp.com/ecmdocs/ECMP1368404/html/GUID-2C3EC0DF-FEFE-4871
-A161-4A3BAC87DB69.html
Thanks for any insight you all can provide.
--
Ian Ehrenwald
Senior Infrastructure Engineer
Hachette Book Group, Inc.
1.617.263.1948 / ian.ehrenwald@hbgusa.com
This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.