Do you have a mixed aggregate in this box ? How large ? 64 and 32 both ? How large each one ? or 64 only – how large ?
Which shelf you have and what are disk IOPs you have, send me info privately……
From: Nicholas Bernstein [mailto:nick@nicholasbernstein.com] Sent: Friday, January 25, 2013 8:33 PM To: Fletcher Cocquyt Cc: Uddhav Regmi; toasters@teaparty.net Subject: Re: Aggregate Disk Busy 100% with volume IOPS low
Try doing a 'ps -c 1' or a wafltop show (double check the syntax) while you're getting the spike; those will probably help you narrow down the processes that are using your disks. Both are priv set advanced/diag commands.
Nick
--
Sent from my mobile device
On Jan 25, 2013, at 4:47 PM, Fletcher Cocquyt <fcocquyt@stanford.edu mailto:fcocquyt@stanford.edu > wrote:
We are still seeing physical disk IO (95% reads) spikes without any volume level IO.
I'm trying to determine if its related to large file deletions or something else - I might have to go digging in the perfstats myself.
Are there any tools available to us to pick apart and analyze perfstats?
<sataspike.jpg>
thanks
On Jan 24, 2013, at 4:24 AM, "Uddhav Regmi" <uregmi111@gmail.com mailto:uregmi111@gmail.com > wrote:
That is normal….
I do see those…..
Just make sure, that network data – are not heavily IN at that time
From: mailto:toasters-bounces@teaparty.net toasters-bounces@teaparty.net [mailto:toasters- mailto:bounces@teaparty.net bounces@teaparty.net] On Behalf Of Fletcher Cocquyt Sent: Thursday, January 24, 2013 12:28 AM To: mailto:toasters@teaparty.net toasters@teaparty.net Lists Subject: Aggregate Disk Busy 100% with volume IOPS low
3270 cluster, OnTAP 8.1-7mode
We are investigating a SATA aggregate showing repeated 5am disk 100% busy spikes without its volumes showing any corresponding IOPS spike as reported by Netapp Management Console (NMC).
The 5am disk busy spikes correlate with very high latency on volumes on a different SAS aggregate. These volumes host VMs which then timeout, some needing reboots.
Today when I heard from Netapp support after reviewing my perfstat the engineer reported this is expected since NVRAM buffers are shared btw aggregates.
But when I dig further into the NMC stats I see the SATA aggregate disk busy actually corresponds to a DROP in IOPS on the 3 volumes hosted on the SATA aggregate - almost like some internal aggregate operations are starving out the external volume ops.
I checked the snapshots (vol and aggr), snap mirror, dedup and none of the usual suspects were running.
When I look at the NMC throughput graphs and switch on the legend - it shows a 5am READ blocks/sec spike corresponding perfectly to the disk busy.
Where are these AGGR level READ operations coming from that are missing from the constituent volume IOPS, and in fact seem to be starving out volume level IO?
I don't see much in the messages log, but will check the rest of the logs for internal type OPS
thanks for any insight
_______________________________________________ Toasters mailing list Toasters@teaparty.net mailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters