I'm troubleshooting an issue we're having with three ESX 3.01 hosts connected to a FAS3050c via Fibre Channel.
After a takeover of the filer node that has active VM/LUN's running on it, the following appears in the /var/log/vmkernel file of the ESX host that owns the active VM's:
Device vmhba2:0:2 has disappeared but is currently in use and could not be removed. Device vmhba2:0:3 has disappeared but is currently in use and could not be removed.
On the other two ESX hosts, this appears in the /var/log/vmkernel file:
Device vmhba2:0:2 has disappeared and has been removed. Device vmhba2:0:3 has disappeared and has been removed.
Now, the VM's survive the takeover and are still up and running, but if I attempt a giveback, all three hosts lose access to these LUN's and the VM's go down.
It appears to me that the ESX host that owns the active VM's has a SCSI reservation to these LUN's and a takeover is not enough to cause it to reset and remove that reservation(even though the paths failover properly).
This type of behavior is mentioned in http://www.vmware.com/community/thread.jspa?messageID=752442.
All of the timeout values have been verified etc. . .
Thanks in advance for the help,
--Carl