I thought I sent this the other day, but check out this link for the cluster failover issue when running ESX 3.0.2: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd =displayKC&externalId=1002974
Also, it has been my experience that VMTools doesn't update disk timeout values. In Win2K3, the registry entry you need to change is:
HKLM\SYSTEM\CurrentControlSet\Services\Disk with a TimeOutValue=190. The value of 190 is from the SAN documentation, but I believe it would be the same for NFS as well.
--Carl
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Page, Jeremy Sent: Thursday, March 20, 2008 11:50 AM To: toasters@mathworks.com Subject: RE: vmware on nfs stability issues
I'm pretty sure the newer version of VMware tools does this, at least for Windows boxes. -----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of De Wit Tom (Consultant) Sent: Wednesday, March 19, 2008 3:04 AM To: Pascal Dukers; toasters@mathworks.com Subject: RE: vmware on nfs stability issues
Hi,
Even when you are on VMWare, you still need to set the disk timeout parameter on all you Virtual machines. This will set the disk timeout value to survive a cluster takeover.
You can find the exact parameter in the Host utilities for ESX admin guide.
Grtz, Tom
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Pascal Dukers Sent: dinsdag 18 maart 2008 18:28 To: toasters@mathworks.com Subject: vmware on nfs stability issues
We have stability issues with vmware esx (3.0.2) on nfs whenever there is a cluster failover. The datastores come back online after the failover is complete (30-40 seconds), but some of the virtual servers (solaris/windows) can crash while the failover takes place.
There are probably some timeout settings I need to configure on all esx hosts, but the NetApp best practice guide on vmware/nfs does not mention changes other than the locking parameter. So I hope someone can share if they encountered the same issue and what they changed to improve it.