Hi,
I can add a "me too" message to this post. I've had more or less the same experience at two customer sites (albeit on physical machines, where I ran into issues with the MSSQL servers and their iSCSI disks.
I can't say that I've experienced the same sort of problems with eg. Exchange setups. Generally, when things are setup correctly wrt. disk timeouts, everything works fine.
The SQL setups I had issues with have more recent versions of the MS iSCSI initiator (around 2.05/2.06 iirc), and I've also thought about upgrading to a more recent version. One thing I came across when investigating, is that Windows can have a very large ARP caching timeout, and during one test, it took the Windows SQL box until long after the filer had booted before the new MAC address was learned from the network. I think Windows 2000 and 2003 can cache an ARP entry for up to 10 minutes, so I really don't know how a disk timeout of 190 seconds is theoretically sufficient for NetApp cluster failovers.
So I would like to know if anyone has experienced the same sort of things, in particular with MS SQL servers and iSCSI.
Regards, Filip
On Mon, Aug 31, 2009 at 2:05 AM, Raj Patelphigmov@gmail.com wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however in the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in terms of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08) from Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.