VM Virtual Disk Latency Versus NFS Latency on Filer

List overview All Threads
Download

newer

older

Boot version info.

Hot adding DS14MK4 trays to...

Philbert Rupkins

21 May 2013 21 May '13

10:19 p.m.

Hello,

This may be more of a VMWare question. From within Data Fabric Manager, I am seeing a spike in latency on a NetApp volume used as an NFS datastore. DFM shows a spike in NFS latency of 50 ms on the volume.

However, a virtual machine's performance counters in vSphere show a spike in Virtual Disk latency upwards of 700ms for the same "latency event".

Any idea what might account for this difference in latency reporting between the NetApp (DFM) and ESXi? I expected there to be some difference for obvious reasons but I didnt expect to see a difference of this magnitude.

Thanks, Phil

Attachments:

attachment.html (text/html — 746 bytes)

Show replies by date

Klise, Steve

21 May 21 May

10:26 p.m.

* May want to check for alignment of your vm's. If you have virtual storage console installed, its under monitor/host config, and tools. Check it out.

* Could be a reallocate issue if you have added a bunch of disks, but is doubtful.

DFM is a used mostly for reporting among other things, but I use it primarily for reporting/protection manager, etc. The Netapp Management Console is your friend. That is the live view or a more "accurate" view of what is currently going on, with the ability to go back in time. DFM is just going to show an average, and is not (IMHO) a good place to look for performance numbers.. Perfstat is the real ticket; that will show you console activity.

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Philbert Rupkins Sent: Tuesday, May 21, 2013 2:19 PM To: toasters@teaparty.net Subject: VM Virtual Disk Latency Versus NFS Latency on Filer

Hello,

However, a virtual machine's performance counters in vSphere show a spike in Virtual Disk latency upwards of 700ms for the same "latency event".

Thanks, Phil

Philbert Rupkins

10:49 p.m.

Thanks for the response. I am actually using the Netapp Management Console. I should not have said DFM.

So to rephrase my original statement, the Netapp Management Console is reporting VERY different numbers than the vSphere Consoles counters for the virtual machines disks. Just trying to account for the difference in accounting between the NetApp Management Consoel and the vSphere disk counters for the virtual machine.

-Phil

On Tue, May 21, 2013 at 4:26 PM, Klise, Steve klises@sutterhealth.orgwrote:

...

**· **May want to check for alignment of your vm’s. If you have virtual storage console installed, its under monitor/host config, and tools. Check it out.****

**· **Could be a reallocate issue if you have added a bunch of disks, but is doubtful. ****

DFM is a used mostly for reporting among other things, but I use it primarily for reporting/protection manager, etc. The Netapp Management Console is your friend. That is the live view or a more “accurate” view of what is currently going on, with the ability to go back in time. DFM is just going to show an average, and is not (IMHO) a good place to look for performance numbers.. Perfstat is the real ticket; that will show you console activity.****

*From:* toasters-bounces@teaparty.net [mailto: toasters-bounces@teaparty.net] *On Behalf Of *Philbert Rupkins *Sent:* Tuesday, May 21, 2013 2:19 PM *To:* toasters@teaparty.net *Subject:* VM Virtual Disk Latency Versus NFS Latency on Filer****

Hello,****

This may be more of a VMWare question. From within Data Fabric Manager, I am seeing a spike in latency on a NetApp volume used as an NFS datastore. DFM shows a spike in NFS latency of 50 ms on the volume.

However, a virtual machine's performance counters in vSphere show a spike in Virtual Disk latency upwards of 700ms for the same "latency event".

Any idea what might account for this difference in latency reporting between the NetApp (DFM) and ESXi? I expected there to be some difference for obvious reasons but I didnt expect to see a difference of this magnitude.****

Thanks,****

Phil****

Ray Van Dolson

11:09 p.m.

I've often wondered what exactly is being measured with the protocol latency metrics either in OnCommand Core or System Manager. Presumably it's measuring the time to service a protocol-sourced IO request from the time it's received to the time a response is transmitted and some sort of ACK is received on the network layer? (maybe not the latter piece).

On the VMware side maybe it's slightly different... if an I/O request has to be broken up into multiple NFS packets for example due to alignment issues....

Would be interesting if someone knows the true background on what these metrics mean.

On Tue, May 21, 2013 at 04:49:44PM -0500, Philbert Rupkins wrote:

...

Thanks for the response. I am actually using the Netapp Management Console. I should not have said DFM.

-Phil

On Tue, May 21, 2013 at 4:26 PM, Klise, Steve klises@sutterhealth.org wrote:

·        May want to check for alignment of your vm’s.  If you have virtual
storage console installed, its under monitor/host config, and tools.  Check
it out.



·        Could be a reallocate issue if you have added a bunch of disks,
but is doubtful. 





DFM is a used mostly for reporting among other things, but I use it
primarily for reporting/protection manager, etc.  The Netapp Management
Console is your friend.  That is the live view or a more “accurate” view of
what is currently going on, with the ability to go back in time.  DFM is
just going to show an average, and is not (IMHO) a good place to look for
performance numbers..   Perfstat is the real ticket; that will show you
console activity.



From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]
On Behalf Of Philbert Rupkins
Sent: Tuesday, May 21, 2013 2:19 PM
To: toasters@teaparty.net
Subject: VM Virtual Disk Latency Versus NFS Latency on Filer



Hello,



This may be more of a VMWare question.    From within Data Fabric Manager,
I am seeing a spike in latency on a NetApp volume used as an NFS datastore.
   DFM shows a spike in NFS latency of 50 ms on the volume.

However,  a virtual machine's performance counters in vSphere show a spike
in Virtual Disk latency upwards of 700ms for the same "latency event".   

Any idea what might account for this difference in latency reporting
between the NetApp (DFM) and ESXi?   I expected there to be some difference
for obvious reasons but I didnt expect to see a difference of this
magnitude.



Thanks,

Phil

Ray Van Dolson

11:16 p.m.

Sorry for replying to my own post. Want to amend my theory:

OnCommand Core's protocol latency -- my thinking is this is measuring the time from when an NFS (or CIFS) packet is received, passes through the protocol layers, is serviced by the I/O subsystem and passes through the protocol layer again and is passed off to the network layer for transmission. I'm guessing it doesn't take into account any sort of acknowledgement or response from the "client".

Ray

On Tue, May 21, 2013 at 03:09:16PM -0700, Ray Van Dolson wrote:

...

I've often wondered what exactly is being measured with the protocol latency metrics either in OnCommand Core or System Manager. Presumably it's measuring the time to service a protocol-sourced IO request from the time it's received to the time a response is transmitted and some sort of ACK is received on the network layer? (maybe not the latter piece).

On the VMware side maybe it's slightly different... if an I/O request has to be broken up into multiple NFS packets for example due to alignment issues....

Would be interesting if someone knows the true background on what these metrics mean.

On Tue, May 21, 2013 at 04:49:44PM -0500, Philbert Rupkins wrote:

...
Thanks for the response. I am actually using the Netapp Management Console. I should not have said DFM.

So to rephrase my original statement, the Netapp Management Console is reporting VERY different numbers than the vSphere Consoles counters for the virtual machines disks. Just trying to account for the difference in accounting between the NetApp Management Consoel and the vSphere disk counters for the virtual machine.

-Phil

On Tue, May 21, 2013 at 4:26 PM, Klise, Steve klises@sutterhealth.org wrote:
·        May want to check for alignment of your vm’s.  If you have virtual
storage console installed, its under monitor/host config, and tools.  Check
it out.



·        Could be a reallocate issue if you have added a bunch of disks,
but is doubtful. 





DFM is a used mostly for reporting among other things, but I use it
primarily for reporting/protection manager, etc.  The Netapp Management
Console is your friend.  That is the live view or a more “accurate” view of
what is currently going on, with the ability to go back in time.  DFM is
just going to show an average, and is not (IMHO) a good place to look for
performance numbers..   Perfstat is the real ticket; that will show you
console activity.



From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]
On Behalf Of Philbert Rupkins
Sent: Tuesday, May 21, 2013 2:19 PM
To: toasters@teaparty.net
Subject: VM Virtual Disk Latency Versus NFS Latency on Filer



Hello,



This may be more of a VMWare question.    From within Data Fabric Manager,
I am seeing a spike in latency on a NetApp volume used as an NFS datastore.
   DFM shows a spike in NFS latency of 50 ms on the volume.

However,  a virtual machine's performance counters in vSphere show a spike
in Virtual Disk latency upwards of 700ms for the same "latency event".   

Any idea what might account for this difference in latency reporting
between the NetApp (DFM) and ESXi?   I expected there to be some difference
for obvious reasons but I didnt expect to see a difference of this
magnitude.



Thanks,

Phil

Philbert Rupkins

22 May 22 May

1:45 p.m.

I think the same, Ray.

I also found an interesting VMWare article that distinguishes between the following items in ESXTOP:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd...

DAVG/cmd - average response time in ms per command being sent to the device KAVG/cmd - amount of time the command spends in the VMkernel GAVG/cmd - response time as perceived by the guest OS (DAVG + KAVG = GAVG)

Armed with this knowledge, I started looking at the ESX host's account of datastore latency. The ESX hosts account of datastore latency more accurately aligns with what I see in the NetApp Management Console. However, I still see much higher guest machine latency when looking at a guest machines Virtual Disk counters. If I am understanding this correctly, that means the command may be spending a lot of time in the VMKernel (high KAVG).

Pretty interesting VMWare article so may be worth checking out. Now to examine our KAVG counters in ESXTOP.

On Tue, May 21, 2013 at 5:16 PM, Ray Van Dolson rvandolson@esri.com wrote:

...

Sorry for replying to my own post. Want to amend my theory:

OnCommand Core's protocol latency -- my thinking is this is measuring the time from when an NFS (or CIFS) packet is received, passes through the protocol layers, is serviced by the I/O subsystem and passes through the protocol layer again and is passed off to the network layer for transmission. I'm guessing it doesn't take into account any sort of acknowledgement or response from the "client".

Ray

On Tue, May 21, 2013 at 03:09:16PM -0700, Ray Van Dolson wrote:

...
I've often wondered what exactly is being measured with the protocol latency metrics either in OnCommand Core or System Manager. Presumably it's measuring the time to service a protocol-sourced IO request from the time it's received to the time a response is transmitted and some sort of ACK is received on the network layer? (maybe not the latter piece).

On the VMware side maybe it's slightly different... if an I/O request has to be broken up into multiple NFS packets for example due to alignment issues....

Would be interesting if someone knows the true background on what these metrics mean.

On Tue, May 21, 2013 at 04:49:44PM -0500, Philbert Rupkins wrote:

...
Thanks for the response. I am actually using the Netapp Management

Console. I

...
...
should not have said DFM.

So to rephrase my original statement, the Netapp Management Console is reporting VERY different numbers than the vSphere Consoles counters

for the

...
...
virtual machines disks. Just trying to account for the difference

in

...
...
accounting between the NetApp Management Consoel and the vSphere disk

counters

...
...
for the virtual machine.

-Phil

On Tue, May 21, 2013 at 4:26 PM, Klise, Steve klises@sutterhealth.org

wrote:

...
...
·        May want to check for alignment of your vm’s.  If you
have virtual

...
...
storage console installed, its under monitor/host config, and
tools. Check

...
...
it out.



·        Could be a reallocate issue if you have added a bunch of
disks,

...
...
but is doubtful.





DFM is a used mostly for reporting among other things, but I use it
primarily for reporting/protection manager, etc.  The Netapp
Management

...
...
Console is your friend.  That is the live view or a more
“accurate” view of

...
...
what is currently going on, with the ability to go back in time.
DFM is

...
...
just going to show an average, and is not (IMHO) a good place to
look for

...
...
performance numbers..   Perfstat is the real ticket; that will
show you

...
...
console activity.



From: toasters-bounces@teaparty.net [mailto:
toasters-bounces@teaparty.net]

...
...
On Behalf Of Philbert Rupkins
Sent: Tuesday, May 21, 2013 2:19 PM
To: toasters@teaparty.net
Subject: VM Virtual Disk Latency Versus NFS Latency on Filer



Hello,



This may be more of a VMWare question.    From within Data Fabric
Manager,

...
...
I am seeing a spike in latency on a NetApp volume used as an NFS
datastore.

...
...
   DFM shows a spike in NFS latency of 50 ms on the volume.

However,  a virtual machine's performance counters in vSphere show
a spike

...
...
in Virtual Disk latency upwards of 700ms for the same "latency
event".

...
...
Any idea what might account for this difference in latency
reporting

...
...
between the NetApp (DFM) and ESXi?   I expected there to be some
difference

...
...
for obvious reasons but I didnt expect to see a difference of this
magnitude.



Thanks,

Phil

4451

Age (days ago)

4452

Last active (days ago)

toasters@lists.teaparty.net

5 comments

3 participants

tags (0)

participants (3)

Klise, Steve
Philbert Rupkins
Ray Van Dolson