We have stability issues with vmware esx (3.0.2) on nfs whenever there is a cluster failover. The datastores come back online after the failover is complete (30-40 seconds), but some of the virtual servers (solaris/windows) can crash while the failover takes place.
There are probably some timeout settings I need to configure on all esx hosts, but the NetApp best practice guide on vmware/nfs does not mention changes other than the locking parameter. So I hope someone can share if they encountered the same issue and what they changed to improve it.
Do you have a case open with NetApp? If not you may need to. I believe setting may be your culprit here.
Cheers,
Vaughn Stewart | Virtualization Evangelist
From: Pascal Dukers pascal.dukers@asml.com Date: Tue, 18 Mar 2008 10:27:30 -0700 (PDT) To: toasters@mathworks.com Subject: vmware on nfs stability issues
We have stability issues with vmware esx (3.0.2) on nfs whenever there is a cluster failover. The datastores come back online after the failover is complete (30-40 seconds), but some of the virtual servers (solaris/windows) can crash while the failover takes place.
There are probably some timeout settings I need to configure on all esx hosts, but the NetApp best practice guide on vmware/nfs does not mention changes other than the locking parameter. So I hope someone can share if they encountered the same issue and what they changed to improve it. -- View this message in context: http://www.nabble.com/vmware-on-nfs-stability-issues-tp16126353p16126353.htm... Sent from the Network Appliance - Toasters mailing list archive at Nabble.com.
Hi,
Even when you are on VMWare, you still need to set the disk timeout parameter on all you Virtual machines. This will set the disk timeout value to survive a cluster takeover.
You can find the exact parameter in the Host utilities for ESX admin guide.
Grtz, Tom
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Pascal Dukers Sent: dinsdag 18 maart 2008 18:28 To: toasters@mathworks.com Subject: vmware on nfs stability issues
We have stability issues with vmware esx (3.0.2) on nfs whenever there is a cluster failover. The datastores come back online after the failover is complete (30-40 seconds), but some of the virtual servers (solaris/windows) can crash while the failover takes place.
There are probably some timeout settings I need to configure on all esx hosts, but the NetApp best practice guide on vmware/nfs does not mention changes other than the locking parameter. So I hope someone can share if they encountered the same issue and what they changed to improve it.
I'm pretty sure the newer version of VMware tools does this, at least for Windows boxes. -----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of De Wit Tom (Consultant) Sent: Wednesday, March 19, 2008 3:04 AM To: Pascal Dukers; toasters@mathworks.com Subject: RE: vmware on nfs stability issues
Hi,
Even when you are on VMWare, you still need to set the disk timeout parameter on all you Virtual machines. This will set the disk timeout value to survive a cluster takeover.
You can find the exact parameter in the Host utilities for ESX admin guide.
Grtz, Tom
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Pascal Dukers Sent: dinsdag 18 maart 2008 18:28 To: toasters@mathworks.com Subject: vmware on nfs stability issues
We have stability issues with vmware esx (3.0.2) on nfs whenever there is a cluster failover. The datastores come back online after the failover is complete (30-40 seconds), but some of the virtual servers (solaris/windows) can crash while the failover takes place.
There are probably some timeout settings I need to configure on all esx hosts, but the NetApp best practice guide on vmware/nfs does not mention changes other than the locking parameter. So I hope someone can share if they encountered the same issue and what they changed to improve it.
I thought I sent this the other day, but check out this link for the cluster failover issue when running ESX 3.0.2: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd =displayKC&externalId=1002974
Also, it has been my experience that VMTools doesn't update disk timeout values. In Win2K3, the registry entry you need to change is:
HKLM\SYSTEM\CurrentControlSet\Services\Disk with a TimeOutValue=190. The value of 190 is from the SAN documentation, but I believe it would be the same for NFS as well.
--Carl
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Page, Jeremy Sent: Thursday, March 20, 2008 11:50 AM To: toasters@mathworks.com Subject: RE: vmware on nfs stability issues
I'm pretty sure the newer version of VMware tools does this, at least for Windows boxes. -----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of De Wit Tom (Consultant) Sent: Wednesday, March 19, 2008 3:04 AM To: Pascal Dukers; toasters@mathworks.com Subject: RE: vmware on nfs stability issues
Hi,
Even when you are on VMWare, you still need to set the disk timeout parameter on all you Virtual machines. This will set the disk timeout value to survive a cluster takeover.
You can find the exact parameter in the Host utilities for ESX admin guide.
Grtz, Tom
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Pascal Dukers Sent: dinsdag 18 maart 2008 18:28 To: toasters@mathworks.com Subject: vmware on nfs stability issues
We have stability issues with vmware esx (3.0.2) on nfs whenever there is a cluster failover. The datastores come back online after the failover is complete (30-40 seconds), but some of the virtual servers (solaris/windows) can crash while the failover takes place.
There are probably some timeout settings I need to configure on all esx hosts, but the NetApp best practice guide on vmware/nfs does not mention changes other than the locking parameter. So I hope someone can share if they encountered the same issue and what they changed to improve it.
Thank you all for your help. I will share a few answers I received, but have not been posted here:
A new NetApp article from last week on how to set timeouts for the different guest os I have:
https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb37986
Also some tuning of the following parameters on the esx servers can be done:
o NFS.HeartbeatFrequency o NFS.HeartbeatTimeout o NFS.HeartbeatMaxFailures
I have been told that with the default settings the timeout seems to be 30 seconds.
Hi toasters,
don't know what experience you have made with the netapp way to deal with the timeouts under linux. We had the problem with SLES9 SP3 systems running on ESX with netapp storage. Every time we had a cluster-takeover on a pair of filers hosting storage with vmwares in it (e.g. during ontap updates) the SLES9 systems did have read-only disks.
I did it the vmware-way and installed a new mpt-scsi driver (mptscsi-gosd-3.02.62-2vmw.i386.rpm) which can be download from vmware. After the installation you only have to reastablish the link of the initrd to the new one (the old one is still present, of course) and reboot the machine.
Works like a charme and seems more smooth than doing the udev-thing.
Best Regards
Jochen
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Pascal Dukers Sent: Saturday, March 22, 2008 11:52 AM To: toasters@mathworks.com Subject: RE: vmware on nfs stability issues
Thank you all for your help. I will share a few answers I received, but have not been posted here:
A new NetApp article from last week on how to set timeouts for the different guest os I have:
https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb37986
Also some tuning of the following parameters on the esx servers can be done:
o NFS.HeartbeatFrequency o NFS.HeartbeatTimeout o NFS.HeartbeatMaxFailures
I have been told that with the default settings the timeout seems to be 30 seconds.