The ARP cache issue wouldn't really explain why Exchange reacts better.
However, I suppose you could verify that theory by attempting a failover on a cluster than is not on the same subnet as the iSCSI client, or decrease the ARP timeout (an entry under [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] IIRC).
Darren
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Filip Sneppe Sent: 31 August 2009 17:26 To: Raj Patel Cc: toasters@mathworks.com Subject: Re: SQL 2005 reacts badly to a cluster giveback ?
Hi,
I can add a "me too" message to this post. I've had more or less the same experience at two customer sites (albeit on physical machines, where I ran into issues with the MSSQL servers and their iSCSI disks.
I can't say that I've experienced the same sort of problems with eg. Exchange setups. Generally, when things are setup correctly wrt. disk timeouts, everything works fine.
The SQL setups I had issues with have more recent versions of the MS iSCSI initiator (around 2.05/2.06 iirc), and I've also thought about upgrading to a more recent version. One thing I came across when investigating, is that Windows can have a very large ARP caching timeout, and during one test, it took the Windows SQL box until long after the filer had booted before the new MAC address was learned from the network. I think Windows 2000 and 2003 can cache an ARP entry for up to 10 minutes, so I really don't know how a disk timeout of 190 seconds is theoretically sufficient for NetApp cluster failovers.
So I would like to know if anyone has experienced the same sort of things, in particular with MS SQL servers and iSCSI.
Regards, Filip
On Mon, Aug 31, 2009 at 2:05 AM, Raj Patelphigmov@gmail.com wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however
in the
Giveback it appears that the SQL server has a couple of initiator
errors
events logged and although the drives are visible (and working in
terms of
I/O) and the SQL services are still running any SQL dependent
applications
just don't work after the giveback. As soon as I stop/start the SQL
services
its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface
goes
through a dedicated iSCSI NIC (a virtual switch which also carries the
ESX
iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a
32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64
bit
servers) but SQL was definitely unhappy (even though the SQL service
itself
carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08)
from
Microsoft. I'm pretty sure we haven't had this Giveback issue with our
old
SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
To report this email as spam click https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg== DPJR0BclKWgOsHu6LKDaZ!IFATt2KLQNAhmYIqzE2R4VA== .
Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom