Fletcher Cocquyt wrote:
We are still seeing physical disk IO (95% reads) spikes without any volume level IO. I'm trying to determine if its related to large file deletions or something else - I might have to go digging in the perfstats myself. Are there any tools available to us to pick apart and analyze perfstats?
There exists a NetApp tool which assists in interpretation of perfstat outputs, which often consist of 100s of MB of textual data. The LatX Web app. It's by no means an expert tool which somehow magically tells you what's going on. It parses, divides up, and visualizes the perfstat output to make it much easier to overview when looking at a system
AFAIK to have access to LatX, you need to have an account at support.netapp.com either as a NetApp employee or have status as a NetApp partner. I don't know of any way you as an "ordinary" customer can gain access to LatX by youself to use it at will.
Of course correctly interpreting perfstat output with or without LatX takes a great amount of knowledge about the inner workings of ONTAP -- stuff that has no easily accessible documentation.
NetApp normally don't even mention to customers about ONTAP Kernel Domains (serial Kahuna, parallel Kahuna aka wafl_exempt etc), but the fact remains they are crucial to understanding the bottlenecks inside a 7-mode Filer under high pressure. When reading perfstat output you really must understand these things or you'll stand little chance of drawing any useful conclusions
/M