Re: Aggregate Disk Busy 100% with volume IOPS low - toasters

27 Jan 2013


      Indeed,
We are considering replacing our premium support with next day and using the savings to buy some professional services - we've heard other groups see better support ROI with this combination.
On the wafl block reclamation - are you talking about options wafl.trunc.throttle.vol.max etc? we had to tune that back in 2010 7.3x days:
http://www.vmadmin.info/2010/11/vfiler-migrate-netapp-lockup.html
But not sure this is still a hidden option in 8.1.x?
I read references to a tool called perfviewer - anyone still using that?
thanks
On Jan 26, 2013, at 1:45 PM, Isaiah zoratu@gmail.com wrote:
...
If I were you, I would purchase some incident-based support from Berkeley Communications ("Berkcom").  They're the only reseller of both used and new NetApp gear in the world. They know more about NetApp than NetApp. I've been a customer for nine years and they're the first number I call--because they're all experts. No escalation yadda yadda. Completely worth the modest fees to support gear not purchased through them.
The last time your situation happened to me I ended up having to tune the wafl block reclamation aggressiveness. There were so many snapshots happening on the system with lots of gradual changes that the disk utilization was high just reclaiming blocks that used to be dirty.
- Isaiah


On Jan 26, 2013, at 10:15, Fletcher Cocquyt fcocquyt@stanford.edu wrote:
...
On Nick's advice I setup a job to log both wafltop and ps -c 1 once per minute - and we had a sustained sata0 disk busy from 5am-7am as reported by NMC.
First question I have from wafltop show is - what is the first row (sata0::file i/o) reporting ?  What could be the source of these 28907 non-volume specific  Read IOs?
       Application   MB Total MB Read(STD) MB Write(STD) Read IOs(STD) Write IOs(STD) 
       -----------   -------- ------------ ------------- ------------- -------------- 
  sata0::file i/o:       5860         5830            30         28907              0

sata0:backup:nfsv3:        608            0           608            31              0
I'm just starting to go through the data
aggr status                 
           Aggr State           Status            Options
          sata0 online          raid_dp, aggr     nosnap=on, raidsize=12
                                64-bit            
          aggr2 online          raid_dp, aggr     nosnap=on, raidsize=19
                                64-bit            
          aggr1 online          raid_dp, aggr     root, nosnap=on, raidsize=14
                                32-bit            
na04*> df -Ah                      
Aggregate                total       used      avail capacity  
aggr1                     13TB       11TB     1431GB      89%  
aggr2                     19TB       14TB     5305GB      74%  
sata0                     27TB       19TB     8027GB      72%
<sataIOPSJan26.jpeg>
thanks
On Jan 25, 2013, at 5:33 PM, Nicholas Bernstein nick@nicholasbernstein.com wrote:
...
Try doing a 'ps -c 1' or a wafltop show (double check the syntax) while you're getting the spike; those will probably help you narrow down the processes that are using your disks. Both are priv set advanced/diag commands.
Nick

Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters