Hello list,

First post here … my role w/NTAP in the last three years has been with the OnCommand Performance Manager (OPM) program.

For your first question – OPM 1.0 had many limitations, OPM 1.1 is the GA version and has many improvements in the troubleshooting analysis page. I can help you figure out 1.0 but we’ve fixed many of the confusing issues in it with the 1.1 release. We also have OPM 2.0 in the works coming out within weeks. I suggest you switch to 2.0 once it ships. It’s a much improved product.

OPM 2.0 calculates an Aggregate and Node utilization metric, shows it front and center in the dashboard and allows you to set alert thresholds. All metrics in OPM 2.0 are hyperlinked with an explorer interface.

As for figuring out what is pushing an aggregate, again there are improvements in 1.1 to help with that and 2.0 is MUCH BETTER (and yes I know I’m speaking loudly).

All the best,

Joseph "Yossi" Weihs
Sr. Manager, Product Management

Manageability Products Group

NetApp
“Simplicity is complexity resolved” - Constantin Brâncuși

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Basil
Sent: Thursday, June 25, 2015 8:51 AM
To: toasters@teaparty.net
Subject: OnCommand Performance Manager events

I'm using 1.0.0 r2 with a CDOT cluster running 8.2.3. When I look at an event, it says things like "3 victim volumes slow due to 5 bully volumes causing contention on aggr_SAS". When I drill in, I can see a list of volumes with their latency at that period, but no indication whether they're being considered victims or bullies. Has anyone had any experience with this software?

Another question- is there a way to look directly at an aggregate's utilization? If I drill into a volume, I can see the aggregate with "break down data by - components - disk operations", but that doesn't show me the aggregate itself. Just the contribution of this volume to the aggregate. When I look into an event, the aggregate utilization is listed, but I can't get to there on its own page with a controllable graph.

Lastly, if I know an aggregate is being pushed to 100% but I can't see any volumes pushing harder, is there a way I can check whether system work is responsible? My first thought was that it was dedupe, but that's scheduled to kick off with the lowest possible priority about two hours after this event occurred.

Cheers!

Basil