Hello,
This may be more of a VMWare question. From within Data Fabric Manager, I am seeing a spike in latency on a NetApp volume used as an NFS datastore. DFM shows a spike in NFS latency of 50 ms on the volume.
However, a virtual machine's performance counters in vSphere show a spike in Virtual Disk latency upwards of 700ms for the same "latency event".
Any idea what might account for this difference in latency reporting between the NetApp (DFM) and ESXi? I expected there to be some difference for obvious reasons but I didnt expect to see a difference of this magnitude.
Thanks, Phil
* May want to check for alignment of your vm's. If you have virtual storage console installed, its under monitor/host config, and tools. Check it out.
* Could be a reallocate issue if you have added a bunch of disks, but is doubtful.
DFM is a used mostly for reporting among other things, but I use it primarily for reporting/protection manager, etc. The Netapp Management Console is your friend. That is the live view or a more "accurate" view of what is currently going on, with the ability to go back in time. DFM is just going to show an average, and is not (IMHO) a good place to look for performance numbers.. Perfstat is the real ticket; that will show you console activity.
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Philbert Rupkins Sent: Tuesday, May 21, 2013 2:19 PM To: toasters@teaparty.net Subject: VM Virtual Disk Latency Versus NFS Latency on Filer
Hello,
This may be more of a VMWare question. From within Data Fabric Manager, I am seeing a spike in latency on a NetApp volume used as an NFS datastore. DFM shows a spike in NFS latency of 50 ms on the volume.
However, a virtual machine's performance counters in vSphere show a spike in Virtual Disk latency upwards of 700ms for the same "latency event".
Any idea what might account for this difference in latency reporting between the NetApp (DFM) and ESXi? I expected there to be some difference for obvious reasons but I didnt expect to see a difference of this magnitude.
Thanks, Phil
Thanks for the response. I am actually using the Netapp Management Console. I should not have said DFM.
So to rephrase my original statement, the Netapp Management Console is reporting VERY different numbers than the vSphere Consoles counters for the virtual machines disks. Just trying to account for the difference in accounting between the NetApp Management Consoel and the vSphere disk counters for the virtual machine.
-Phil
On Tue, May 21, 2013 at 4:26 PM, Klise, Steve klises@sutterhealth.orgwrote:
**· **May want to check for alignment of your vm’s. If you have virtual storage console installed, its under monitor/host config, and tools. Check it out.****
**· **Could be a reallocate issue if you have added a bunch of disks, but is doubtful. ****
DFM is a used mostly for reporting among other things, but I use it primarily for reporting/protection manager, etc. The Netapp Management Console is your friend. That is the live view or a more “accurate” view of what is currently going on, with the ability to go back in time. DFM is just going to show an average, and is not (IMHO) a good place to look for performance numbers.. Perfstat is the real ticket; that will show you console activity.****
*From:* toasters-bounces@teaparty.net [mailto: toasters-bounces@teaparty.net] *On Behalf Of *Philbert Rupkins *Sent:* Tuesday, May 21, 2013 2:19 PM *To:* toasters@teaparty.net *Subject:* VM Virtual Disk Latency Versus NFS Latency on Filer****
Hello,****
This may be more of a VMWare question. From within Data Fabric Manager, I am seeing a spike in latency on a NetApp volume used as an NFS datastore. DFM shows a spike in NFS latency of 50 ms on the volume.
However, a virtual machine's performance counters in vSphere show a spike in Virtual Disk latency upwards of 700ms for the same "latency event".
Any idea what might account for this difference in latency reporting between the NetApp (DFM) and ESXi? I expected there to be some difference for obvious reasons but I didnt expect to see a difference of this magnitude.****
Thanks,****
Phil****
I've often wondered what exactly is being measured with the protocol latency metrics either in OnCommand Core or System Manager. Presumably it's measuring the time to service a protocol-sourced IO request from the time it's received to the time a response is transmitted and some sort of ACK is received on the network layer? (maybe not the latter piece).
On the VMware side maybe it's slightly different... if an I/O request has to be broken up into multiple NFS packets for example due to alignment issues....
Would be interesting if someone knows the true background on what these metrics mean.
On Tue, May 21, 2013 at 04:49:44PM -0500, Philbert Rupkins wrote:
Thanks for the response. I am actually using the Netapp Management Console. I should not have said DFM.
So to rephrase my original statement, the Netapp Management Console is reporting VERY different numbers than the vSphere Consoles counters for the virtual machines disks. Just trying to account for the difference in accounting between the NetApp Management Consoel and the vSphere disk counters for the virtual machine.
-Phil
On Tue, May 21, 2013 at 4:26 PM, Klise, Steve klises@sutterhealth.org wrote:
· May want to check for alignment of your vm’s. If you have virtual storage console installed, its under monitor/host config, and tools. Check it out. · Could be a reallocate issue if you have added a bunch of disks, but is doubtful. DFM is a used mostly for reporting among other things, but I use it primarily for reporting/protection manager, etc. The Netapp Management Console is your friend. That is the live view or a more “accurate” view of what is currently going on, with the ability to go back in time. DFM is just going to show an average, and is not (IMHO) a good place to look for performance numbers.. Perfstat is the real ticket; that will show you console activity. From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Philbert Rupkins Sent: Tuesday, May 21, 2013 2:19 PM To: toasters@teaparty.net Subject: VM Virtual Disk Latency Versus NFS Latency on Filer Hello, This may be more of a VMWare question. From within Data Fabric Manager, I am seeing a spike in latency on a NetApp volume used as an NFS datastore. DFM shows a spike in NFS latency of 50 ms on the volume. However, a virtual machine's performance counters in vSphere show a spike in Virtual Disk latency upwards of 700ms for the same "latency event". Any idea what might account for this difference in latency reporting between the NetApp (DFM) and ESXi? I expected there to be some difference for obvious reasons but I didnt expect to see a difference of this magnitude. Thanks, Phil
Sorry for replying to my own post. Want to amend my theory:
OnCommand Core's protocol latency -- my thinking is this is measuring the time from when an NFS (or CIFS) packet is received, passes through the protocol layers, is serviced by the I/O subsystem and passes through the protocol layer again and is passed off to the network layer for transmission. I'm guessing it doesn't take into account any sort of acknowledgement or response from the "client".
Ray
On Tue, May 21, 2013 at 03:09:16PM -0700, Ray Van Dolson wrote:
I've often wondered what exactly is being measured with the protocol latency metrics either in OnCommand Core or System Manager. Presumably it's measuring the time to service a protocol-sourced IO request from the time it's received to the time a response is transmitted and some sort of ACK is received on the network layer? (maybe not the latter piece).
On the VMware side maybe it's slightly different... if an I/O request has to be broken up into multiple NFS packets for example due to alignment issues....
Would be interesting if someone knows the true background on what these metrics mean.
On Tue, May 21, 2013 at 04:49:44PM -0500, Philbert Rupkins wrote:
Thanks for the response. I am actually using the Netapp Management Console. I should not have said DFM.
So to rephrase my original statement, the Netapp Management Console is reporting VERY different numbers than the vSphere Consoles counters for the virtual machines disks. Just trying to account for the difference in accounting between the NetApp Management Consoel and the vSphere disk counters for the virtual machine.
-Phil
On Tue, May 21, 2013 at 4:26 PM, Klise, Steve klises@sutterhealth.org wrote:
· May want to check for alignment of your vm’s. If you have virtual storage console installed, its under monitor/host config, and tools. Check it out. · Could be a reallocate issue if you have added a bunch of disks, but is doubtful. DFM is a used mostly for reporting among other things, but I use it primarily for reporting/protection manager, etc. The Netapp Management Console is your friend. That is the live view or a more “accurate” view of what is currently going on, with the ability to go back in time. DFM is just going to show an average, and is not (IMHO) a good place to look for performance numbers.. Perfstat is the real ticket; that will show you console activity. From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Philbert Rupkins Sent: Tuesday, May 21, 2013 2:19 PM To: toasters@teaparty.net Subject: VM Virtual Disk Latency Versus NFS Latency on Filer Hello, This may be more of a VMWare question. From within Data Fabric Manager, I am seeing a spike in latency on a NetApp volume used as an NFS datastore. DFM shows a spike in NFS latency of 50 ms on the volume. However, a virtual machine's performance counters in vSphere show a spike in Virtual Disk latency upwards of 700ms for the same "latency event". Any idea what might account for this difference in latency reporting between the NetApp (DFM) and ESXi? I expected there to be some difference for obvious reasons but I didnt expect to see a difference of this magnitude. Thanks, Phil
I think the same, Ray.
I also found an interesting VMWare article that distinguishes between the following items in ESXTOP:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd...
DAVG/cmd - average response time in ms per command being sent to the device KAVG/cmd - amount of time the command spends in the VMkernel GAVG/cmd - response time as perceived by the guest OS (DAVG + KAVG = GAVG)
Armed with this knowledge, I started looking at the ESX host's account of datastore latency. The ESX hosts account of datastore latency more accurately aligns with what I see in the NetApp Management Console. However, I still see much higher guest machine latency when looking at a guest machines Virtual Disk counters. If I am understanding this correctly, that means the command may be spending a lot of time in the VMKernel (high KAVG).
Pretty interesting VMWare article so may be worth checking out. Now to examine our KAVG counters in ESXTOP.
On Tue, May 21, 2013 at 5:16 PM, Ray Van Dolson rvandolson@esri.com wrote:
Sorry for replying to my own post. Want to amend my theory:
OnCommand Core's protocol latency -- my thinking is this is measuring the time from when an NFS (or CIFS) packet is received, passes through the protocol layers, is serviced by the I/O subsystem and passes through the protocol layer again and is passed off to the network layer for transmission. I'm guessing it doesn't take into account any sort of acknowledgement or response from the "client".
Ray
On Tue, May 21, 2013 at 03:09:16PM -0700, Ray Van Dolson wrote:
I've often wondered what exactly is being measured with the protocol latency metrics either in OnCommand Core or System Manager. Presumably it's measuring the time to service a protocol-sourced IO request from the time it's received to the time a response is transmitted and some sort of ACK is received on the network layer? (maybe not the latter piece).
On the VMware side maybe it's slightly different... if an I/O request has to be broken up into multiple NFS packets for example due to alignment issues....
Would be interesting if someone knows the true background on what these metrics mean.
On Tue, May 21, 2013 at 04:49:44PM -0500, Philbert Rupkins wrote:
Thanks for the response. I am actually using the Netapp Management
Console. I
should not have said DFM.
So to rephrase my original statement, the Netapp Management Console is reporting VERY different numbers than the vSphere Consoles counters
for the
virtual machines disks. Just trying to account for the difference
in
accounting between the NetApp Management Consoel and the vSphere disk
counters
for the virtual machine.
-Phil
On Tue, May 21, 2013 at 4:26 PM, Klise, Steve klises@sutterhealth.org
wrote:
· May want to check for alignment of your vm’s. If you
have virtual
storage console installed, its under monitor/host config, and
tools. Check
it out. · Could be a reallocate issue if you have added a bunch of
disks,
but is doubtful. DFM is a used mostly for reporting among other things, but I use it primarily for reporting/protection manager, etc. The Netapp
Management
Console is your friend. That is the live view or a more
“accurate” view of
what is currently going on, with the ability to go back in time.
DFM is
just going to show an average, and is not (IMHO) a good place to
look for
performance numbers.. Perfstat is the real ticket; that will
show you
console activity. From: toasters-bounces@teaparty.net [mailto:
toasters-bounces@teaparty.net]
On Behalf Of Philbert Rupkins Sent: Tuesday, May 21, 2013 2:19 PM To: toasters@teaparty.net Subject: VM Virtual Disk Latency Versus NFS Latency on Filer Hello, This may be more of a VMWare question. From within Data Fabric
Manager,
I am seeing a spike in latency on a NetApp volume used as an NFS
datastore.
DFM shows a spike in NFS latency of 50 ms on the volume. However, a virtual machine's performance counters in vSphere show
a spike
in Virtual Disk latency upwards of 700ms for the same "latency
event".
Any idea what might account for this difference in latency
reporting
between the NetApp (DFM) and ESXi? I expected there to be some
difference
for obvious reasons but I didnt expect to see a difference of this magnitude. Thanks, Phil