Ok so two things (comments).
1. I believe Paul meant the new metric 'Node Utilisation' in his reply. N.B. there's no PC in the CM or anything like that for it, it's only inside OCPM
Since it's actually currently defined like this (I *think*):
system:system:node_util = MAX(system:system:avg_processor_busy, # Normalized to 85 100-system:system:b2b_cp_margin, <Kahuna utilisation>) # Normalized to 50
what Paul wrote makes sense:
[...] because utilization _includes_ the only domain that could cause you pain by being over utilized.
2. Pls note! There's no Performance Counter in the CM called system:system:b2b_cp_margin system:system:node_util.
It's just a notation I used to make it clear and stringent. I think there probably *should* be such PCs, in the future!
My general view is that Kahuna isn't the only serial domain that can cause you pain by being over utilised. It's not common, rare rather, that any of the other 9 can bottleneck a system, but it can (and has) happened. And, as I wrote before, you can get hurt by over utilised multi-threaded domains too. Again, it's not that common though personally I think that it would make a lot of sense to include at least a few of those domains in the overall fomula for 'Node Util' as well. R&D efforts is ongoing I'm sure :-)
The main argument about Kahuna being so dominant in causing trouble is heavy CIFS workload. SMB operations which have to be serialised, and are done a lot... :-(
That said: my very humble opinion is that since ONTAP 8.2.1 system:system:cpu_busy actually isn't that bad at all. If you know what it shows, and how it's calculated it tells you stuff about utilisation of some or other of the 10 serial domains inside the system. Point being: it may not be Kahuna (even if it most often is). I've watched our systems for long periods of time, looking at the difference between these two in parallel:
system:system:cpu_busy system:system:avg_processor_busy
while at the same time running sysstat -M. Conclusion: it's not at all always Kahuna that makes the former go up now and then. It's been a bit of a mystery at times, as I've had trouble matching it together so that I can tell which of the 10 single threaded domains is causing cpu_busy to increase during some measurement intervals. I need to do more with this, the data shown by sysstat -M is in the CM as PC as well so it's better to use 'stats show' in the node shell to look at it
Hope this helps, /M
On 2016-04-21 21:11, Michael Bergman wrote:
If by "montoring utilisation" Paul means this PC:
system:system:cpu_busy
(N.B. the formula that calculates this inside the Counter Mgr changed in 8.2.1 both 7- & c-mode)
...then yes, it includes the highest utilised *single threaded* kernel domain. Serial domains are all except *_exempt, wafl_xcleaner (2 threads), hostOS. For a recent/modern ONTAP that is, don't trust this if you're still on some old version!
The formula for calculating it is like this:
MAX(system:system:average_processor_busy, MAX(util_of(s-threaded domain1, s-threaded domain2,... domain10))
and it has been since 8.2.1 and still is in all 8.3 rels to this date. [...]