Originally attempted to post June 22, 2010! - thanks for fixing the list - We have since identified the issue by deconstructing the IOPS behind the latency spikes and resolved per:
http://www.vmadmin.info/2010/07/vmware-and-netapp-deconstructing.html
Hope it proves useful for anyone else with similar issues
On 6/22/10 10:19 AM, "Fletcher Cocquyt" fcocquyt@stanford.edu wrote:
Hi, We have a 3040 cluster hosting 11 vSphere hosts with 200 VMs on NFS datastores. We see latency spikes 3-4 times a month as reported by Operations Manager.
We hoped our upgrade from 7.3.1.1 last week to 7.3.3 would help, but we¹ve had many spikes up to 1 second take out a NFS mount and all several of the VMs since going to 7.3.3
We previously determined the High & medium IO VMs and either aligned them or migrated them to local disk - has NOT helped - still getting the spikes.
I have another case opened with Netapp.
Following the notes in this latency spike related thread, http://communities.netapp.com/message/30657 I ran the wafl_susp -w to check the pw.over_limit
Turns out ours is ZERO (is it relevant to NFS?)
I suspect an internal Netapp process is responsible for these (dedup?) - we had it disabled on 7.3.1.1 - 7.3.3 was supposed to fix this (we re-enabled de-dup after the upgrade)
And the latency spike outages are back
Will share any info from the case
thanks for any tips,