How do I get off this fun list? Sent from my Verizon Wireless BlackBerry
-----Original Message----- From: "Carl Howell" chowell@uwf.edu
Date: Thu, 20 Sep 2007 10:51:56 To:"Wilkinson, Brent" BWilkinson@CoBank.com, "Glenn Dekhayser" gdekhayser@voyantinc.com Cc:toasters@mathworks.com Subject: RE: ESX FC Give & Take
Thanks for the feedback Brian. Is there anyone else suffering from this as well? Does anyone know if there is already a BURT(?) at NetApp to address this?
--Carl
-----Original Message----- From: Wilkinson, Brent [mailto:BWilkinson@CoBank.com] Sent: Thursday, September 20, 2007 10:48 AM To: Carl Howell; Glenn Dekhayser Cc: toasters@mathworks.com Subject: RE: ESX FC Give & Take
We are also experiencing this issue with a NetApp 920c Filer configuration running Data Ontap 7.2.3 in SSI mode. We currently have tickets opened with VMWare and NetApp. It is good to know that others are experiencing this issue rather than trying to figure out if there is a configuration error on my end.
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Carl Howell Sent: Thursday, September 20, 2007 8:20 AM To: Glenn Dekhayser Cc: toasters@mathworks.com Subject: RE: ESX FC Give & Take
From VMWare's "San Configuration Guide" P.118, it says you should set Disk.UseDeviceReset=1. Has this changed?
Also from this discussion http://www.vmware.com/community/thread.jspa?messageID=749388 it appears that this might be a known issue that can't be fixed?
--Carl
-----Original Message----- From: Glenn Dekhayser [mailto:gdekhayser@voyantinc.com] Sent: Thursday, September 20, 2007 8:24 AM To: Carl Howell Cc: toasters@mathworks.com Subject: RE: ESX FC Give & Take
I have a client with the exact same problem. VMWAre's response:
" Please try the following changes:
On your Netapp system, enter the following commands
"fcp config 0c down" and wait a few minutes.
Enable this port manually by entering the "fcp config 0c up" command.
Alternatively, executing an "fcp stop" and "fcp start" in short succession (no waiting time required) also resolves the issue.
To ensure proper operation, you must reset the FC connection after entering the "cf giveback" command. This can be done on the VMWare ESX host by entering the following commands:
esxcfg-module -s qlport_down_retry=60 <HBA-name> esxcfg-advcfg -s 0 /Disk/UseLunReset esxcfg-advcfg -s 0 /Disk/UseDeviceReset
These settings will result in the following type message on the console after a giveback:
Thu Oct 12 09:26:24 CEST [lk-san1a: scsitarget.ispfct.targetReset:CRITICAL]: FCP Target: Target Reset (from port 210000e08b0e922a), aborting all SCSI commands Once the connection is reset, it should work properly. Last Updated: 13 OCT 2006 "
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner- toasters@mathworks.com] On Behalf Of Carl Howell Sent: Wednesday, September 19, 2007 5:52 PM To: toasters@mathworks.com Subject: ESX FC Give & Take
I'm troubleshooting an issue we're having with three ESX 3.01 hosts connected to a FAS3050c via Fibre Channel.
After a takeover of the filer node that has active VM/LUN's running on
it, the following appears in the /var/log/vmkernel file of the ESX
host
that owns the active VM's:
Device vmhba2:0:2 has disappeared but is currently in use and could
not
be removed. Device vmhba2:0:3 has disappeared but is currently in use and could
not
be removed.
On the other two ESX hosts, this appears in the /var/log/vmkernel
file:
Device vmhba2:0:2 has disappeared and has been removed. Device vmhba2:0:3 has disappeared and has been removed.
Now, the VM's survive the takeover and are still up and running, but
if
I attempt a giveback, all three hosts lose access to these LUN's and the VM's go down.
It appears to me that the ESX host that owns the active VM's has a
SCSI
reservation to these LUN's and a takeover is not enough to cause it to
reset and remove that reservation(even though the paths failover properly).
This type of behavior is mentioned in http://www.vmware.com/community/thread.jspa?messageID=752442.
All of the timeout values have been verified etc. . .
Thanks in advance for the help,
--Carl