Hello Toasters I'm thinking about if/how to implement Flash Pool in our new cDOT clusters, and was wondering if anyone could provide a bit of real world guidance for me.
We have two SAS aggregates, aggr_sas_600g_c1n1 with 8x24 600g and aggr_sas_1200g_c1n1 with 4x24 1.2t.
I've been doing a bunch of reading about Flash Pool vs Flash Cache and am trying to better understand their strengths and weaknesses. Flash Pool accelerates writes as well as reads (Flash Cache is reads only), however with Flash Pool there seems to be the potential for slower cache access/throughput vs Flash Cache since the data needs to travel the SAS path vs Flash Cache which is probably DMA through PCIe. Maybe that's not a concern at all, I don't know. Additionally, it appears that using Flash Pool disables the Flash Cache functionality for the aggregates which are in hybrid mode (makes sense), but then we have expensive add-in cards doing nothing.
Our theoretical Flash Pool would be 2x24 200g, giving us about 5.5t of usable caching space to sprinkle into these aggregates.
I've been running AWA on the cluster against those two SAS aggregates for ~24 hours and have come up with these stats:
### FP AWA Stats ###
Host mrk_c1n1 Memory 61054 MB ONTAP Version NetApp Release 8.3.2P5: Tue Aug 23 01:27:00 PDT 2016
Basic Information Aggregate aggr_sas_600g_c1n1 Current-time Tue Nov 22 17:06:53 EST 2016 Start-time Mon Nov 21 12:48:17 EST 2016 Total runtime (sec) 101918 Interval length (sec) 600 Total intervals 157 In-core Intervals 1024
Summary of the past 157 intervals max ------------ Read Throughput (MB/s): 339.039 Write Throughput (MB/s): 123.536 Cacheable Read (%): 56 Cacheable Write (%): 66 Max Projected Cache Size (GiB): 787.463
Summary Cache Hit Rate vs. Cache Size Referenced Cache Size (GiB): 714.650 Referenced Interval: ID 132 starting at Tue Nov 22 12:33:05 EST 2016 Size 20% 40% 60% 80% 100% Read Hit (%) 25 30 30 30 33 Write Hit (%) 1 2 2 2 2
The entire results and output of Automated Workload Analyzer (AWA) are estimates. The format, syntax, CLI, results and output of AWA may change in future Data ONTAP releases. AWA reports the projected cache size in capacity. It does not make recommendations regarding the number of data SSDs required. Please follow the guidelines for configuring and deploying Flash Pool; that are provided in tools and collateral documents. These include verifying the platform cache size maximums and minimum number and maximum number of data SSDs.
Basic Information Aggregate aggr_sas_1200g_c1n1 Current-time Tue Nov 22 17:06:53 EST 2016 Start-time Mon Nov 21 12:40:57 EST 2016 Total runtime (sec) 102357 Interval length (sec) 600 Total intervals 158 In-core Intervals 1024
Summary of the past 158 intervals max ------------ Read Throughput (MB/s): 914.247 Write Throughput (MB/s): 257.318 Cacheable Read (%): 41 Cacheable Write (%): 26 Max Projected Cache Size (GiB): 2412.178
Summary Cache Hit Rate vs. Cache Size Referenced Cache Size (GiB): 2142.380 Referenced Interval: ID 113 starting at Tue Nov 22 09:04:41 EST 2016 Size 20% 40% 60% 80% 100% Read Hit (%) 34 38 38 38 41 Write Hit (%) 7 7 7 7 9
The entire results and output of Automated Workload Analyzer (AWA) are estimates. The format, syntax, CLI, results and output of AWA may change in future Data ONTAP releases. AWA reports the projected cache size in capacity. It does not make recommendations regarding the number of data SSDs required. Please follow the guidelines for configuring and deploying Flash Pool; that are provided in tools and collateral documents. These include verifying the platform cache size maximums and minimum number and maximum number of data SSDs.
### FP AWA Stats End ###
Aggregate aggr_sas_600g_c1n1 has a lot of random overwrites (66%) that could have been cached. The volumes in that aggregate are pretty much exclusively Oracle databases. The other aggregate, aggr_sas_1200g_c1n1, doesn't seem hit as hard.
Given those statistics, what would you do if your options were ~5.5t of Flash Pool vs buying another 2t Flash Cache card per node in this HA pair? I seem to be missing the 'Projected Read Offload' and 'Projected Write Offload' statistics which would have been very useful, mentioned at the Flash Pool documentation in https://library.netapp.com/ecmdocs/ECMP1368404/html/GUID-2C3EC0DF-FEFE-4871 -A161-4A3BAC87DB69.html
Thanks for any insight you all can provide.
-- Ian Ehrenwald Senior Infrastructure Engineer Hachette Book Group, Inc. 1.617.263.1948 / ian.ehrenwald@hbgusa.com
This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.