Hi Jeffrey Thanks for the interesting followup to my statistics, and I apologize for not replying to your previous email from earlier. I am working with our DBA team to see if they're OK sending out AWRs for external analysis. I don't see any reason why we couldn't do it, but DB stuff isn't in my "sphere of influence" ;)
Ian Ehrenwald Senior Infrastructure Engineer Hachette Book Group, Inc. 1.617.263.1948 / ian.ehrenwald@hbgusa.com
________________________________________ From: Steiner, Jeffrey Jeffrey.Steiner@netapp.com Sent: Wednesday, November 30, 2016 6:23:10 AM To: Ehrenwald, Ian; toasters@teaparty.net Cc: Smith, Christopher Subject: RE: Flash Cache vs Flash Pool
It looks like the AWA output shows that you'll hit the optimum caching point with just 60% of the simulated cache you specified. That's pretty typical. The workload is concentrated on a fairly small amount of disk. Just 700GB or so will deliver the maximum value you should expect.
There are some exceptions. For example, there might be a database workload that really REALLY doesn't like even a few extra spinning disk hits, and a little extra cacheability would improve performance a lot. Unlikely, but possible. We'd need to look at AWR data to say more.
-----Original Message----- From: Steiner, Jeffrey Sent: Sunday, November 27, 2016 1:49 PM To: 'Ehrenwald, Ian' Ian.Ehrenwald@hbgusa.com; toasters@teaparty.net Subject: RE: Flash Cache vs Flash Pool
I'm not an expert on AWA, but I'll ask some of the team I work with for thoughts.
I am, however, and Oracle expert. If you really want an opinion on an Oracle workload, send me an AWR report offline. Specifically, use 'awrrpt.sql' to make the report and make sure it targets a period where someone was concerned or complaining about performance. The elapsed time should be no more than one hour. That's the best way to see the bottleneck.
Direct improvements in real-world performance will come from read caching. The improvements from write caching are less clear. When you write to ONTAP, you're actually writing to the NVRAM. That's all the application cares about, and that means you're already writing to pure solid state storage. It's also a lot faster than Flash technology.
When AWA detects an opportunity for write caching, what it's really saying is that it sees repeated overwrites of the same blocks. If you cache that data on SSD, you will reduce the load on the spinning media. That means the spinning media read response times will be better. The improvement depends on just how busy those disks were. Some workloads, such as DataGuard standby databases with very low amounts of RAM tend to do a huge amount of overwrites of a small number of blocks. FlashPool write caching can be hugely helpful there.
-----Original Message----- From: Ehrenwald, Ian [mailto:Ian.Ehrenwald@hbgusa.com] Sent: Sunday, November 27, 2016 4:07 AM To: Steiner, Jeffrey Jeffrey.Steiner@netapp.com; toasters@teaparty.net Subject: Re: Flash Cache vs Flash Pool
Hello Jeffrey Thanks for the recommendations. Many of our workloads run in evening hours and others are almost 24/7, so time of day vs SSD count isn't a factor for us (I think). Here's what AWA is reporting after running for over 5 days against both of the aggregates I previously mentioned:
Basic Information Aggregate aggr_sas_600g_c1n1 Current-time Sat Nov 26 17:21:42 EST 2016 Start-time Mon Nov 21 12:48:17 EST 2016 Total runtime (sec) 448414 Interval length (sec) 600 Total intervals 672 In-core Intervals 1024
Summary of the past 672 intervals max ------------ Read Throughput (MB/s): 352.553 Write Throughput (MB/s): 142.209 Cacheable Read (%): 56 Cacheable Write (%): 79 Max Projected Cache Size (GiB): 1284.788
Summary Cache Hit Rate vs. Cache Size Referenced Cache Size (GiB): 1270.275 Referenced Interval: ID 647 starting at Sat Nov 26 12:33:15 EST 2016 Size 20% 40% 60% 80% 100% Read Hit (%) 54 63 64 64 64 Write Hit (%) 1 1 1 1 1
Basic Information Aggregate aggr_sas_1200g_c1n1 Current-time Sat Nov 26 17:21:42 EST 2016 Start-time Mon Nov 21 12:40:57 EST 2016 Total runtime (sec) 448853 Interval length (sec) 600 Total intervals 673 In-core Intervals 1024
Summary of the past 673 intervals max ------------ Read Throughput (MB/s): 933.115 Write Throughput (MB/s): 257.318 Cacheable Read (%): 62 Cacheable Write (%): 26 Max Projected Cache Size (GiB): 5080.340
Summary Cache Hit Rate vs. Cache Size Referenced Cache Size (GiB): 4715.367 Referenced Interval: ID 640 starting at Sat Nov 26 10:55:47 EST 2016 Size 20% 40% 60% 80% 100% Read Hit (%) 35 41 41 42 42 Write Hit (%) 11 11 11 11 14
The first aggregate listed (aggr_sas_600g_c1n1) is where our database volumes live, vast majority of them being Oracle. Based on what I see from this report, that aggregate could really benefit from write caching provided by FlashPool since 79% of the operations were cacheable. At the very least, our latency should go down markedly?
Almost inversely, the other aggregate (aggr_sas_1200g_c1n1) is much heavier on the read side with 62% read cacheable. This aggregate contains our VMware datastores, application binaries, etc and it might benefit more from staying in FlashCache.
Am I understanding and interpreting the data from AWA correctly?
Ian Ehrenwald Senior Infrastructure Engineer Hachette Book Group, Inc. 1.617.263.1948 / ian.ehrenwald@hbgusa.com
________________________________________ From: Steiner, Jeffrey Jeffrey.Steiner@netapp.com Sent: Wednesday, November 23, 2016 6:36 AM To: Ehrenwald, Ian; toasters@teaparty.net Subject: RE: Flash Cache vs Flash Pool
As you wrote, FlashCache will not be used by data residing on an SSD or FlashPool aggregate. ' In my experience, the difference between FlashCache and FlashPool is almost never described in terms of performance. It can happen, but it usually seems to come up only with really obscure workloads, such as a system that being absolutely crushed with a random IO where the lower overall overhead of FlashCache helps a bit. It's rare, though.
Here's my main thoughts:
1) FlashPool will never go cold due to a power failure or similar. That's my #1 reason for preferring it to FlashCache.
2) FlashPool can capture random overwrites, which can be really, really helpful with certain database workloads that have a lot of such IO.
3) FlashCache can be shared among multiple aggregates according to their needs, whereas FlashPool is fixed to one. Sometimes that helps address unknown or dynamic caching needs.
4) The fact some IO was cacheable doesn't mean anyone cares it was cacheable. AWA does pretty good, but it's not definitive. For example, let's say you have a workload that could be 2X faster with 1TB of FlashPool SSD but it runs at midnight and nobody cares about how fast it runs. Why waste the SSD?
I'd probably just take it slow. Add a few SSD's to each aggregate and dole them out slowly. Reevaulate every so often. Remember - once an SSD is added into, you can't get rid of it. This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network. This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.