Re: OnCommand CPU report question for 2.x OPM

21 Apr 2016


      Ok so two things (comments).
1.
I believe Paul meant the new metric 'Node Utilisation' in his reply.
N.B. there's no PC in the CM or anything like that for it, it's only inside OCPM
Since it's actually currently defined like this (I *think*):
system:system:node_util =
    MAX(system:system:avg_processor_busy,  # Normalized to 85
    100-system:system:b2b_cp_margin,
    <Kahuna utilisation>)                  # Normalized to 50
what Paul wrote makes sense:
...
[...] because utilization _includes_ the only domain that could cause
you pain by being over utilized.
2.
Pls note!  There's no Performance Counter in the CM called
system:system:b2b_cp_margin
system:system:node_util.
It's just a notation I used to make it clear and stringent. I think there 
probably *should* be such PCs, in the future!
My general view is that Kahuna isn't the only serial domain that can cause 
you pain by being over utilised. It's not common, rare rather, that any of 
the other 9 can bottleneck a system, but it can (and has) happened.  And, as 
I wrote before, you can get hurt by over utilised multi-threaded domains 
too.  Again, it's not that common though personally I think that it would 
make a lot of sense to include at least a few of those domains in the 
overall fomula for 'Node Util' as well.  R&D efforts is ongoing I'm sure :-)
The main argument about Kahuna being so dominant in causing trouble is heavy 
CIFS workload. SMB operations which have to be serialised, and are done a 
lot... :-(
That said:  my very humble opinion is that since ONTAP 8.2.1 
system:system:cpu_busy actually isn't that bad at all. If you know what it 
shows, and how it's calculated it tells you stuff about utilisation of some 
or other of the 10 serial domains inside the system. Point being: it may not 
be Kahuna (even if it most often is).  I've watched our systems for long 
periods of time, looking at the difference between these two in parallel:
system:system:cpu_busy
system:system:avg_processor_busy
while at the same time running sysstat -M.  Conclusion: it's not at all 
always Kahuna that makes the former go up now and then.  It's been a bit of 
a mystery at times, as I've had trouble matching it together so that I can 
tell which of the 10 single threaded domains is causing cpu_busy to increase 
during some measurement intervals.  I need to do more with this, the data 
shown by sysstat -M is in the CM as PC as well so it's better to use 'stats 
show' in the node shell to look at it
Hope this helps,
/M
On 2016-04-21 21:11, Michael Bergman wrote:
...
If by "montoring utilisation" Paul means this PC:
system:system:cpu_busy
(N.B. the formula that calculates this inside the Counter Mgr changed in
8.2.1 both 7- & c-mode)
...then yes, it includes the highest utilised *single threaded* kernel
domain. Serial domains are all except *_exempt, wafl_xcleaner (2 threads),
hostOS. For a recent/modern ONTAP that is, don't trust this if you're still
on some old version!
The formula for calculating it is like this:
MAX(system:system:average_processor_busy,
MAX(util_of(s-threaded domain1, s-threaded domain2,... domain10))
and it has been since 8.2.1 and still is in all 8.3 rels to this date.
[...]

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: OnCommand CPU report question for 2.x OPM