Netapp 3040 Ontap 7.3.3 latency spikes - crashes vmware - toasters

25 Aug 2011


      Originally attempted to post June 22, 2010! - thanks for fixing the list -
We have since identified the issue by deconstructing the IOPS behind the
latency spikes and resolved per:
http://www.vmadmin.info/2010/07/vmware-and-netapp-deconstructing.html
Hope it proves useful for anyone else with similar issues
On 6/22/10 10:19 AM, "Fletcher Cocquyt" fcocquyt@stanford.edu wrote:
Hi, 
We have a 3040 cluster hosting 11 vSphere hosts with 200 VMs on NFS
datastores.
We see latency spikes 3-4 times a month as reported by Operations Manager.
We hoped our upgrade from 7.3.1.1 last week to 7.3.3 would help, but we¹ve
had many spikes up to 1 second take out a NFS mount and all several of the
VMs since going to 7.3.3
We previously  determined the High & medium IO VMs and either aligned them
or migrated them to local disk - has NOT helped - still getting the spikes.
I have another case opened with Netapp.
Following the notes in this latency spike related thread,
http://communities.netapp.com/message/30657
 I ran the wafl_susp -w to check the pw.over_limit
Turns out ours is ZERO (is it relevant to NFS?)
I suspect an internal Netapp process is responsible for these (dedup?) - we
had it disabled on 7.3.1.1 - 7.3.3 was supposed to fix this (we re-enabled
de-dup after the upgrade)
And the latency spike outages are back
Will share any info from the case
thanks for any tips,
-- 
Fletcher Cocquyt
Principal Engineer
Information Resources and Technology (IRT)
Stanford University School of Medicine

http://vmadmin.info