Re: CPU usage and HA

8 Apr 2014


      Michael Garrison wrote:
...
At one point, there was a NetApp KB article talking about how DOT
takes advantage of the different CPUs and talking about bottlenecks.
It appears they no longer have that article as public -
Because the information in it is no longer valid, and would lead people in 
the wrong directions, supposedly.  I can understand this, keeping such a 
document up-2-date with all that's happened between 8.0.x and 8.1.x and then 
8.2 w.r.t. parallelisation of things in the kernel would be a daunting task 
at best
...
One of the big things we learned from this is that when looking at
Kahuna, you also need to take a look at the (Kahu) value listed in
WAFL_EX. If Kahu+Kahuna add up to around 100%, that is when you have a
bottleneck in the Kahuna zone. We've run into this many times on our
FAS6240s.
This was valid for some (older) ONTAP release, I can't really tell which 
one.  What rel was on your 6240s when you saw this kind of saturation?
I don't think that what sysstat -M tells you is accurate in the sense that 
it will enable to you understand a bottleneck as described above, even in 
8.0.x (it *might* be) -- definitely not in 8.1.x (big difference in the 
parallelisation of things, waffinity changed *a lot* between 8.0 and 8.1)
...
The Kahu value are items that CAN NOT run simultaneous with Kahuna
items. This means that if (Kahu) is 60% and Kahuna is running at 39%,
The Kahuna zone is actually at 99% - so it's bottlenecked.
True for 8.0 *iff* the Kahu value in sysstat -M takes into account the 
parallelism (up to 5 I think it was) of parallel-Kahuna. I don't know.
Not accurate for 8.1.x, not even close
...
There are some bugs in DOT that can contribute to this, I'd have to go
back through some of my old information but I can tell you they're
fixed in 8.1.4p1. However, workload can contribute immensely to this.
In my experience, CIFS is impacted the most by this, since a lot of
the CIFS operations are serial.
Absolutely, CIFS is very much more "serial" than NFS. I'm lucky where I am 
to have very NFS dominant workload, CIFS is more or less residual so we 
never have any issues
...
that'll certainly impact it are things like snapmirror deswizzling,
large snapshot updates firing off at once, etc. For the most part, NFS
seems to have no issues, but CIFS latency will go through the roof and
if you're at the edge, you won't know it until you cross it and CIFS
becomes unusable.
The problem with lots of snapshots being fired off, isn't the taking of the 
snapshots per se as it's literally gratis w.r.t. resources.  It's the 
deletion of snapshots, everyone has a schedule and it has to roll... A 
really expensive operation inside ONTAP, as is any deletion of files really. 
  A weakness quite simply one can say.  Usually with NFS and in pre 8.1 
(when parallelism got much better), the SETATTR op would always stand out as 
the slowest d*** thing in the whole machine, and when snapshot deletes were 
running... ouch.
The underlying reason for SETATTR being so slow, is AFAIU that it goes 
through serialised parts (s-Kahuna) due to messing with the WAFL buffer 
cache and keeping the integrity of that is so critical that serialisation is 
a necessity (losing control of the integrity of WAFL buffer cache = panic 
and halt, it's always been that way).
/M

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: CPU usage and HA