No NDMP in use for us - at a loss to explain this level of AGGR disk busy with no vol level IO
Feels like an internal type operation hitting a bug
thanks
On Jan 23, 2013, at 10:21 PM, "Klise, Steve" <klises(a)sutterhealth.org> wrote:
> A stab but what about ndmp jobs?
>
> From: Fletcher Cocquyt [mailto:fcocquyt@stanford.edu]
> Sent: Wednesday, January 23, 2013 09:27 PM
> To: toasters(a)teaparty.net Lists <toasters(a)teaparty.net>
> Subject: Aggregate Disk Busy 100% with volume IOPS low
>
> 3270 cluster, OnTAP 8.1-7mode
>
> We are investigating a SATA aggregate showing repeated 5am disk 100% busy spikes without its volumes showing any corresponding IOPS spike as reported by Netapp Management Console (NMC).
> The 5am disk busy spikes correlate with very high latency on volumes on a different SAS aggregate. These volumes host VMs which then timeout, some needing reboots.
> Today when I heard from Netapp support after reviewing my perfstat the engineer reported this is expected since NVRAM buffers are shared btw aggregates.
>
> But when I dig further into the NMC stats I see the SATA aggregate disk busy actually corresponds to a DROP in IOPS on the 3 volumes hosted on the SATA aggregate - almost like some internal aggregate operations are starving out the external volume ops.
> I checked the snapshots (vol and aggr), snap mirror, dedup and none of the usual suspects were running.
>
> When I look at the NMC throughput graphs and switch on the legend - it shows a 5am READ blocks/sec spike corresponding perfectly to the disk busy.
>
> Where are these AGGR level READ operations coming from that are missing from the constituent volume IOPS, and in fact seem to be starving out volume level IO?
>
> I don't see much in the messages log, but will check the rest of the logs for internal type OPS
>
> thanks for any insight
>
>
>
>
>
>