RE: Netapp FAS610 8.1.3P1 stalling

22 Oct 2013


      It sounds like you experienced "The Dead Cat bounce"    I believe vmware suggests lowering the queue depth on your esxi host to 64.
-----Original Message-----
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Milazzo Giacomo
Sent: Tuesday, October 22, 2013 10:22 AM
To: Martin; toasters@teaparty.net
Subject: R: Netapp FAS610 8.1.3P1 stalling
It remember me something happened few weeks ago to a customer of mine.
You've got a bug but your version seems to be a fixing one
https://forums.netapp.com/thread/19352
-----Messaggio originale-----
Da: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] Per conto di Martin
Inviato: martedì 22 ottobre 2013 16.06
A: toasters@teaparty.net
Oggetto: Netapp FAS610 8.1.3P1 stalling
One of our 6210 Filers running 8.1.3P1 stalled I/O for approx 1 minute
(11:08:27 -11:09:24)
We saw this as latency of 5 seconds on the VMware hosts attached to the Filer via NFS and eventually "All Paths Down" messages on the ESX hosts.
I also saw warnings in the messages file:
Tue Oct 22 11:09:24 BST [TOASTER1: NwkThd_01:warning]: NFS response to client x.x.x.x for volume 0x5834a5c(vol004) was slow, op was v3 write, 69 >
60 (in seconds)
...
From looking in DFM Filer Summary view I see a "Z" shape in the graph for most of the counters on the Filer e.g. CPU, Network Throughput, All Protocol Ops.  The counters dip low then rapidly increase and tail off again. (see attached JPG) http://network-appliance-toasters.10978.n7.nabble.com/file/n25314/Filer-Z-shape.jpg
During this time it all of the ESX hosts saw timeouts to the NFS datastores.
I checked the disk_busy on the only aggregate (90 x 15K SAS 450GB) on the Filer and it only shows the disks as 30-40% busy and the disk_busy drops during the time the Filer stalled. It seems odd as the overall load on the Filer didn't increase to precipitate this.
http://network-appliance-toasters.10978.n7.nabble.com/file/n25314/Filer-Z-shape-disks.jpg
...
From previous experience of performance cases with Netapp, we're usually asked to gather a perfstat next time it happens but this isn't possible as it's unpredictable and short lived.
The thing that concerns me is from previous experience this is usually a precursor to the Filer stalling for much longer periods causing much more impact.  In the past we've found the only option is to upgrade the Filer head.
I would appreciate any pointers on how to identify the root cause of this.
--
View this message in context: http://network-appliance-toasters.10978.n7.nabble.com/Netapp-FAS610-8-1-3P1-...
Sent from the Network Appliance - Toasters mailing list archive at Nabble.com.
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

RE: Netapp FAS610 8.1.3P1 stalling