Hi toasters,
we have just recently "upgraded" our netapp infrastructure and moved from lots of small FC drives, to a much smaller number of SATA drives. We knew this would cause a performance decrease and compensated by installing 512GB flexcache cards. Its a HA 3170A.
So far it all kind of works. However, its clear that the caches take a long time to heat up, several hours in fact. During that time, performance is "average" to say the least.
So, I guess I have 2 questions for people who are doing similar things:
1) are there any tips on ways to improve the cold start performance (I cannot think of any). 2) if I have a HA failover, I imagine my performance will be even worse since the second heads cache will be completely cold for the data from the failed head and to make things worse, the second head will have to eject data from the cache for its own local volumes. Anybody with experience of this? 3) Any bright ideas other than more spindles?
It would be nice if the caches were nvram, to avoid the cold start problem. I really netapp should use SSD's for the cache for exactly this reason. Or am I missing something?
Regards, pdg
If OnTap 8.1+ is on your roadmap for upgrades then cold starts should become a thing of the past. Even today they should be few and far between and only due to power outages, upgrades, and hardware maintenance. Two of those can be planned and if you're incredibly lucky or fortunate you could possibly manage a graceful shutdown during a power event making all three a moot point.
As of OnTap 8.1 the FlashCache is snapshotted and staged during a graceful shutdown so there is no longer an elongated cache rebuild time during a power on event. Today you're simply at the mercy of how fast and how much data you're pushing through the system to determine how long it's going to take for the extended cache to begin providing visible benefit. I have large VM environments that see benefit in less than 15 minutes and other environments that simply don't push that much data quickly to see a benefit for several hours. Today you're only solution is to have more spindles on the backend to assist with that. You could also look at enabling FlexShare (free and built into OnTap) to start prioritizing volumes for extended cache use (i.e. picking and choosing which volumes get to load all data into the extended cache, metadata only, or no data at all).
Here is the relevant snippet from the release notes for 8.1RC2:
Starting in Data ONTAP 8.1, the WAFL external cache preserves the cache in a Flash Cache module during a graceful shutdown.
When a storage system powers down, the WAFL external cache takes a snapshot of the data in a Flash Cache module. When the system powers up, it uses the snapshot to rebuild the cache. After the process completes, the system can read data from the cache. This process, called cache rewarming, helps to maintain system performance after a planned shutdown. For example, you might shut down a system to add hardware or upgrade software.
Cache rewarming is enabled by default if you have a Flash Cache module installed.
http://now.netapp.com/NOW/knowledge/docs/ontap/rel81rc2/pdfs/ontap/rnote.pdf
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Peter D. Gray Sent: Tuesday, December 13, 2011 5:28 PM To: toasters@teaparty.net Subject: flexcache and cold caches
Hi toasters,
we have just recently "upgraded" our netapp infrastructure and moved from lots of small FC drives, to a much smaller number of SATA drives. We knew this would cause a performance decrease and compensated by installing 512GB flexcache cards. Its a HA 3170A.
So far it all kind of works. However, its clear that the caches take a long time to heat up, several hours in fact. During that time, performance is "average" to say the least.
So, I guess I have 2 questions for people who are doing similar things:
1) are there any tips on ways to improve the cold start performance (I cannot think of any). 2) if I have a HA failover, I imagine my performance will be even worse since the second heads cache will be completely cold for the data from the failed head and to make things worse, the second head will have to eject data from the cache for its own local volumes. Anybody with experience of this? 3) Any bright ideas other than more spindles?
It would be nice if the caches were nvram, to avoid the cold start problem. I really netapp should use SSD's for the cache for exactly this reason. Or am I missing something?
Regards, pdg
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Hi Chris,
How does this work for takeover situations? That is, to go from two flash caches to one...
Sent via mobile.
On Dec 13, 2011, at 6:42 PM, Chris Muellner chris@northlandusa.com wrote:
If OnTap 8.1+ is on your roadmap for upgrades then cold starts should become a thing of the past. Even today they should be few and far between and only due to power outages, upgrades, and hardware maintenance. Two of those can be planned and if you're incredibly lucky or fortunate you could possibly manage a graceful shutdown during a power event making all three a moot point.
If the takeover is administratively initiated then it's considered graceful and the cache is properly de-staged. If the takeover is due to a panic then the cache is not de-staged and will need to be rebuilt from scratch when the controller comes back online.
In both scenarios the surviving controller's extended cache becomes responsible for both controllers' workloads, so while in takeover mode the partner's FlashCache will load both its blocks and the partner's blocks into its cache. After the giveback is performed the partner's blocks that were being loaded into the surviving controller's cache will stale out and be ejected and replaced.
-----Original Message----- From: Eugene Vilensky [mailto:evilensky@gmail.com] Sent: Tuesday, December 13, 2011 6:49 PM To: Chris Muellner Cc: toasters@teaparty.net Subject: Re: flexcache and cold caches
Hi Chris,
How does this work for takeover situations? That is, to go from two flash caches to one...
Sent via mobile.
On Dec 13, 2011, at 6:42 PM, Chris Muellner chris@northlandusa.com wrote:
If OnTap 8.1+ is on your roadmap for upgrades then cold starts should become a thing of the past. Even today they should be few and far between and only due to power outages, upgrades, and hardware maintenance. Two of those can be planned and if you're incredibly lucky or fortunate you could possibly manage a graceful shutdown during a power event making all three a moot point.
There is a good TR on all of this: http://media.netapp.com/documents/tr-3832.pdf
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Chris Muellner Sent: Tuesday, December 13, 2011 8:00 PM To: Eugene Vilensky Cc: toasters@teaparty.net Subject: RE: flexcache and cold caches
If the takeover is administratively initiated then it's considered graceful and the cache is properly de-staged. If the takeover is due to a panic then the cache is not de-staged and will need to be rebuilt from scratch when the controller comes back online.
In both scenarios the surviving controller's extended cache becomes responsible for both controllers' workloads, so while in takeover mode the partner's FlashCache will load both its blocks and the partner's blocks into its cache. After the giveback is performed the partner's blocks that were being loaded into the surviving controller's cache will stale out and be ejected and replaced.
-----Original Message----- From: Eugene Vilensky [mailto:evilensky@gmail.com] Sent: Tuesday, December 13, 2011 6:49 PM To: Chris Muellner Cc: toasters@teaparty.net Subject: Re: flexcache and cold caches
Hi Chris,
How does this work for takeover situations? That is, to go from two flash caches to one...
Sent via mobile.
On Dec 13, 2011, at 6:42 PM, Chris Muellner chris@northlandusa.com wrote:
If OnTap 8.1+ is on your roadmap for upgrades then cold starts should become a thing of the past. Even today they should be few and far between and only due to power outages, upgrades, and hardware maintenance. Two of those can be planned and if you're incredibly lucky or fortunate you could possibly manage a graceful shutdown during a power event making all three a moot point.
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
On Wed, Dec 14, 2011 at 02:24:22AM +0000, Parisi, Justin wrote:
There is a good TR on all of this: http://media.netapp.com/documents/tr-3832.pdf
Thanks you to those that replied. It looks like netapp are aware this is a problem, and are working on ameliorate the problem somewhat.
However, it looks to me that the new age of large and relatively slow drives will be causing us grief for some time to come.
Regards, pdg