From VMWare's "San Configuration Guide" P.118, it says you should set Disk.UseDeviceReset=1. Has this changed?
Also from this discussion http://www.vmware.com/community/thread.jspa?messageID=749388 it appears that this might be a known issue that can't be fixed?
--Carl
-----Original Message----- From: Glenn Dekhayser [mailto:gdekhayser@voyantinc.com] Sent: Thursday, September 20, 2007 8:24 AM To: Carl Howell Cc: toasters@mathworks.com Subject: RE: ESX FC Give & Take
I have a client with the exact same problem. VMWAre's response:
" Please try the following changes:
On your Netapp system, enter the following commands
"fcp config 0c down" and wait a few minutes.
Enable this port manually by entering the "fcp config 0c up" command.
Alternatively, executing an "fcp stop" and "fcp start" in short succession (no waiting time required) also resolves the issue.
To ensure proper operation, you must reset the FC connection after entering the "cf giveback" command. This can be done on the VMWare ESX host by entering the following commands:
esxcfg-module -s qlport_down_retry=60 <HBA-name> esxcfg-advcfg -s 0 /Disk/UseLunReset esxcfg-advcfg -s 0 /Disk/UseDeviceReset
These settings will result in the following type message on the console after a giveback:
Thu Oct 12 09:26:24 CEST [lk-san1a: scsitarget.ispfct.targetReset:CRITICAL]: FCP Target: Target Reset (from port 210000e08b0e922a), aborting all SCSI commands Once the connection is reset, it should work properly. Last Updated: 13 OCT 2006 "
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner- toasters@mathworks.com] On Behalf Of Carl Howell Sent: Wednesday, September 19, 2007 5:52 PM To: toasters@mathworks.com Subject: ESX FC Give & Take
I'm troubleshooting an issue we're having with three ESX 3.01 hosts connected to a FAS3050c via Fibre Channel.
After a takeover of the filer node that has active VM/LUN's running on it, the following appears in the /var/log/vmkernel file of the ESX
host
that owns the active VM's:
Device vmhba2:0:2 has disappeared but is currently in use and could
not
be removed. Device vmhba2:0:3 has disappeared but is currently in use and could
not
be removed.
On the other two ESX hosts, this appears in the /var/log/vmkernel
file:
Device vmhba2:0:2 has disappeared and has been removed. Device vmhba2:0:3 has disappeared and has been removed.
Now, the VM's survive the takeover and are still up and running, but
if
I attempt a giveback, all three hosts lose access to these LUN's and the VM's go down.
It appears to me that the ESX host that owns the active VM's has a
SCSI
reservation to these LUN's and a takeover is not enough to cause it to reset and remove that reservation(even though the paths failover properly).
This type of behavior is mentioned in http://www.vmware.com/community/thread.jspa?messageID=752442.
All of the timeout values have been verified etc. . .
Thanks in advance for the help,
--Carl