Re: Aggregate Disk Busy 100% with volume IOPS low - resolved: dedup

4 Feb 2013


      There might be a way by checking sis object in the netapp api.
https://communities.netapp.com/servlet/JiveServlet/previewBody/1044-102-2-75...
Tracking those stats might give you, and the rest of us, an idea of
the impact of sis jobs.  Kinda useless after the fact, but it may help
in the future.
-Blake
On Mon, Feb 4, 2013 at 8:40 AM, Fletcher Cocquyt fcocquyt@stanford.edu wrote:
...
Once we disabled the reallocate measure and dedup jobs the spikes
disappeared.
The dedup job was scheduled quite a bit earlier than when the IO spikes
showed up.
Plus we did not notice an issue until we added just a bit more IO with app
(oracle) or a reallocate measure.
None of the (external or internal) tools could tell us directly what the
source of the IO was.
thanks
On Jan 26, 2013, at 5:42 PM, Nicholas Bernstein nick@nicholasbernstein.com
wrote:
Usually the ps will show you the process that's using the io indirectly,
since its also probably using some CPU. Disk scrub, media scrub,
reallocate_measure are just a couple things I can think off off the top of
my head that are things that could cause read io.
Stats explain should be able to give you more info on that counter. Sorry
this don't a more useful response, I'm on my phone and sick in bed. :/
--
Sent from my mobile device
On Jan 26, 2013, at 10:15 AM, Fletcher Cocquyt fcocquyt@stanford.edu
wrote:
On Nick's advice I setup a job to log both wafltop and ps -c 1 once per
minute - and we had a sustained sata0 disk busy from 5am-7am as reported by
NMC.
First question I have from wafltop show is - what is the first row
(sata0::file i/o) reporting ?  What could be the source of these 28907
non-volume specific  Read IOs?
       Application   MB Total MB Read(STD) MB Write(STD) Read IOs(STD)

Write IOs(STD)
           -----------   -------- ------------ ------------- -------------

  sata0::file i/o:       5860         5830            30         28907

0
   sata0:backup:nfsv3:        608            0           608            31
0
I'm just starting to go through the data
aggr status
           Aggr State           Status            Options
          sata0 online          raid_dp, aggr     nosnap=on, raidsize=12
                                64-bit
          aggr2 online          raid_dp, aggr     nosnap=on, raidsize=19
                                64-bit
          aggr1 online          raid_dp, aggr     root, nosnap=on,
raidsize=14
                                32-bit
na04*> df -Ah
Aggregate                total       used      avail capacity
aggr1                     13TB       11TB     1431GB      89%
aggr2                     19TB       14TB     5305GB      74%
sata0                     27TB       19TB     8027GB      72%
<sataIOPSJan26.jpeg>
thanks
On Jan 25, 2013, at 5:33 PM, Nicholas Bernstein nick@nicholasbernstein.com
wrote:
Try doing a 'ps -c 1' or a wafltop show (double check the syntax) while
you're getting the spike; those will probably help you narrow down the
processes that are using your disks. Both are priv set advanced/diag
commands.
Nick

Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Aggregate Disk Busy 100% with volume IOPS low - resolved: dedup