Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however in the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in terms of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08) from Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
Hi,
I can add a "me too" message to this post. I've had more or less the same experience at two customer sites (albeit on physical machines, where I ran into issues with the MSSQL servers and their iSCSI disks.
I can't say that I've experienced the same sort of problems with eg. Exchange setups. Generally, when things are setup correctly wrt. disk timeouts, everything works fine.
The SQL setups I had issues with have more recent versions of the MS iSCSI initiator (around 2.05/2.06 iirc), and I've also thought about upgrading to a more recent version. One thing I came across when investigating, is that Windows can have a very large ARP caching timeout, and during one test, it took the Windows SQL box until long after the filer had booted before the new MAC address was learned from the network. I think Windows 2000 and 2003 can cache an ARP entry for up to 10 minutes, so I really don't know how a disk timeout of 190 seconds is theoretically sufficient for NetApp cluster failovers.
So I would like to know if anyone has experienced the same sort of things, in particular with MS SQL servers and iSCSI.
Regards, Filip
On Mon, Aug 31, 2009 at 2:05 AM, Raj Patelphigmov@gmail.com wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however in the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in terms of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08) from Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
The ARP cache issue wouldn't really explain why Exchange reacts better.
However, I suppose you could verify that theory by attempting a failover on a cluster than is not on the same subnet as the iSCSI client, or decrease the ARP timeout (an entry under [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] IIRC).
Darren
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Filip Sneppe Sent: 31 August 2009 17:26 To: Raj Patel Cc: toasters@mathworks.com Subject: Re: SQL 2005 reacts badly to a cluster giveback ?
Hi,
I can add a "me too" message to this post. I've had more or less the same experience at two customer sites (albeit on physical machines, where I ran into issues with the MSSQL servers and their iSCSI disks.
I can't say that I've experienced the same sort of problems with eg. Exchange setups. Generally, when things are setup correctly wrt. disk timeouts, everything works fine.
The SQL setups I had issues with have more recent versions of the MS iSCSI initiator (around 2.05/2.06 iirc), and I've also thought about upgrading to a more recent version. One thing I came across when investigating, is that Windows can have a very large ARP caching timeout, and during one test, it took the Windows SQL box until long after the filer had booted before the new MAC address was learned from the network. I think Windows 2000 and 2003 can cache an ARP entry for up to 10 minutes, so I really don't know how a disk timeout of 190 seconds is theoretically sufficient for NetApp cluster failovers.
So I would like to know if anyone has experienced the same sort of things, in particular with MS SQL servers and iSCSI.
Regards, Filip
On Mon, Aug 31, 2009 at 2:05 AM, Raj Patelphigmov@gmail.com wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however
in the
Giveback it appears that the SQL server has a couple of initiator
errors
events logged and although the drives are visible (and working in
terms of
I/O) and the SQL services are still running any SQL dependent
applications
just don't work after the giveback. As soon as I stop/start the SQL
services
its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface
goes
through a dedicated iSCSI NIC (a virtual switch which also carries the
ESX
iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a
32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64
bit
servers) but SQL was definitely unhappy (even though the SQL service
itself
carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08)
from
Microsoft. I'm pretty sure we haven't had this Giveback issue with our
old
SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
To report this email as spam click https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg== DPJR0BclKWgOsHu6LKDaZ!IFATt2KLQNAhmYIqzE2R4VA== .
Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
We are about to implement a new SQL 2005 (x64) on NetApp so I will follow this thread pretty closely. We have built a few virtual-virtual active-active clusters and virtual-physical active-active clusters for in house developed software with no issues.
I assume that the iscsi NIC is on the same segment as the storage? I assume you are not using jumbo frames? I assume your netapp cluster is configure for single image mode?
Raj Patel wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however in the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in terms of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08) from Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
Hi,
Yes, all hosts are on the same subnet, no jumbo frames are involved, and for iSCSI, single_image mode isn't really relevant...
Best regards, Filip
On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyonsjack1729@gmail.com wrote:
We are about to implement a new SQL 2005 (x64) on NetApp so I will follow this thread pretty closely. We have built a few virtual-virtual active-active clusters and virtual-physical active-active clusters for in house developed software with no issues.
I assume that the iscsi NIC is on the same segment as the storage? I assume you are not using jumbo frames? I assume your netapp cluster is configure for single image mode?
Raj Patel wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however in the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in terms of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08) from Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
Did you set the prefered filer IP address in snapdrive config. Did you setup your iscsi target using hostname or ip's, could there be name resolution issues?
Filip Sneppe wrote:
Hi,
Yes, all hosts are on the same subnet, no jumbo frames are involved, and for iSCSI, single_image mode isn't really relevant...
Best regards, Filip
On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyonsjack1729@gmail.com wrote:
We are about to implement a new SQL 2005 (x64) on NetApp so I will follow this thread pretty closely. We have built a few virtual-virtual active-active clusters and virtual-physical active-active clusters for in house developed software with no issues.
I assume that the iscsi NIC is on the same segment as the storage? I assume you are not using jumbo frames? I assume your netapp cluster is configure for single image mode?
Raj Patel wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however in the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in terms of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08) from Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
Hi,
Yes, the preferred IP address was set (using hostname/IP address pairs). I have no indication of name resolution issues, ie. the LAN interfaces of the filers are statis DNS entries, and the iSCSI IP addresses are set using the filer preferred IP addresses in SnapDrive.
Best regards, Filip
On Tue, Sep 1, 2009 at 1:14 PM, Jack Lyonsjack1729@gmail.com wrote:
Did you set the prefered filer IP address in snapdrive config. Did you setup your iscsi target using hostname or ip's, could there be name resolution issues?
Filip Sneppe wrote:
Hi,
Yes, all hosts are on the same subnet, no jumbo frames are involved, and for iSCSI, single_image mode isn't really relevant...
Best regards, Filip
On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyonsjack1729@gmail.com wrote:
We are about to implement a new SQL 2005 (x64) on NetApp so I will follow this thread pretty closely. We have built a few virtual-virtual active-active clusters and virtual-physical active-active clusters for in house developed software with no issues.
I assume that the iscsi NIC is on the same segment as the storage? I assume you are not using jumbo frames? I assume your netapp cluster is configure for single image mode?
Raj Patel wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however in the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in terms of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08) from Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
Hi,
The Hostname/IP Address paris in PreferredIPAddresses in Snapdrive are NOT for the iSCSI traffic, they are meant for the management (RPC) traffic, and they need to be reverse-resolvable and normal network ports on the Filer reachable from the Guest-OS network.
iSCSI traffic is only determined by the setting in the initiator.
For the failover: make sure you use Host Utilities for ESX 5.0R2 or 5.1 and normally the config)hba should be run during the install. A reboot is then required! -- Olaf Leimann
-----Original Message----- From: Filip Sneppe [mailto:filip.sneppe@gmail.com] Sent: dinsdag 1 september 2009 13:30 To: Jack Lyons Cc: Raj Patel; toasters@mathworks.com Subject: Re: SQL 2005 reacts badly to a cluster giveback ?
Hi,
Yes, the preferred IP address was set (using hostname/IP address pairs). I have no indication of name resolution issues, ie. the LAN interfaces of the filers are statis DNS entries, and the iSCSI IP addresses are set using the filer preferred IP addresses in SnapDrive.
Best regards, Filip
On Tue, Sep 1, 2009 at 1:14 PM, Jack Lyonsjack1729@gmail.com wrote:
Did you set the prefered filer IP address in snapdrive config. Did you setup your iscsi target using hostname or ip's, could there be name resolution issues?
Filip Sneppe wrote:
Hi,
Yes, all hosts are on the same subnet, no jumbo frames are involved, and for iSCSI, single_image mode isn't really relevant...
Best regards, Filip
On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyonsjack1729@gmail.com wrote:
We are about to implement a new SQL 2005 (x64) on NetApp so I will follow this thread pretty closely. We have built a few virtual-virtual active-active clusters and virtual-physical active-active clusters for in house developed software with no issues.
I assume that the iscsi NIC is on the same segment as the storage? I assume you are not using jumbo frames? I assume your netapp cluster is configure for single image mode?
Raj Patel wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however in the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in terms of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08) from Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
Among the other things reported there's another one. 270 (and 20X0) are "slow" filer when they performs a takeover or a giveback. ISCSI hosts (both physical and VMs) are faster for these reasons there are a couple of Windows registry keys to increase the timeouts of iSCSI "hba".
These is well documented on NOW.
Regards,
-----Messaggio originale----- Da: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] Per conto di Leimann, Olaf Inviato: martedì 1 settembre 2009 14.50 A: Filip Sneppe; Jack Lyons Cc: Raj Patel; toasters@mathworks.com Oggetto: RE: SQL 2005 reacts badly to a cluster giveback ?
Hi,
The Hostname/IP Address paris in PreferredIPAddresses in Snapdrive are NOT for the iSCSI traffic, they are meant for the management (RPC) traffic, and they need to be reverse-resolvable and normal network ports on the Filer reachable from the Guest-OS network.
iSCSI traffic is only determined by the setting in the initiator.
For the failover: make sure you use Host Utilities for ESX 5.0R2 or 5.1 and normally the config)hba should be run during the install. A reboot is then required! -- Olaf Leimann
-----Original Message----- From: Filip Sneppe [mailto:filip.sneppe@gmail.com] Sent: dinsdag 1 september 2009 13:30 To: Jack Lyons Cc: Raj Patel; toasters@mathworks.com Subject: Re: SQL 2005 reacts badly to a cluster giveback ?
Hi,
Yes, the preferred IP address was set (using hostname/IP address pairs). I have no indication of name resolution issues, ie. the LAN interfaces of the filers are statis DNS entries, and the iSCSI IP addresses are set using the filer preferred IP addresses in SnapDrive.
Best regards, Filip
On Tue, Sep 1, 2009 at 1:14 PM, Jack Lyonsjack1729@gmail.com wrote:
Did you set the prefered filer IP address in snapdrive config. Did you setup your iscsi target using hostname or ip's, could there be name resolution issues?
Filip Sneppe wrote:
Hi,
Yes, all hosts are on the same subnet, no jumbo frames are involved, and for iSCSI, single_image mode isn't really relevant...
Best regards, Filip
On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyonsjack1729@gmail.com wrote:
We are about to implement a new SQL 2005 (x64) on NetApp so I will follow this thread pretty closely. We have built a few virtual-virtual active-active clusters and virtual-physical active-active clusters for in house developed software with no issues.
I assume that the iscsi NIC is on the same segment as the storage? I assume you are not using jumbo frames? I assume your netapp cluster is configure for single image mode?
Raj Patel wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however in the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in terms of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08) from Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
Thanks for all the advice - the key take aways would appear to be -
* try the host utilities - they make registry tweaks to key time out values * failover / giveback can be a little slow on the low-end FAS models
As suggested I had a bit of a poke around NOW and came across some interesting stuff
* http://now.netapp.com/NOW/knowledge/docs/mpio/win/reldsm31/html/software/ins...
We don't use MPIO but I'll try the registry tweak.
I'm still unsure wether this was a problem or not under SD 4.2.1 - I don't think it was but these reboots are so infrequent its hard to recall.
We're replacing our 270C with a 2050HA so hopefuly failover speed will improve.
Do people install the Host Utilities as a standard ? The notes seem to relate more towards FC rather than iSCSI setups so I've held off using them.
Cheers, Raj.
On Sun, Sep 6, 2009 at 2:56 AM, Milazzo Giacomo G.Milazzo@sinergy.itwrote:
Among the other things reported there's another one. 270 (and 20X0) are "slow" filer when they performs a takeover or a giveback. ISCSI hosts (both physical and VMs) are faster for these reasons there are a couple of Windows registry keys to increase the timeouts of iSCSI "hba".
These is well documented on NOW.
Regards,
-----Messaggio originale----- Da: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] Per conto di Leimann, Olaf Inviato: martedì 1 settembre 2009 14.50 A: Filip Sneppe; Jack Lyons Cc: Raj Patel; toasters@mathworks.com Oggetto: RE: SQL 2005 reacts badly to a cluster giveback ?
Hi,
The Hostname/IP Address paris in PreferredIPAddresses in Snapdrive are NOT for the iSCSI traffic, they are meant for the management (RPC) traffic, and they need to be reverse-resolvable and normal network ports on the Filer reachable from the Guest-OS network.
iSCSI traffic is only determined by the setting in the initiator.
For the failover: make sure you use Host Utilities for ESX 5.0R2 or 5.1 and normally the config)hba should be run during the install. A reboot is then required! -- Olaf Leimann
-----Original Message----- From: Filip Sneppe [mailto:filip.sneppe@gmail.com] Sent: dinsdag 1 september 2009 13:30 To: Jack Lyons Cc: Raj Patel; toasters@mathworks.com Subject: Re: SQL 2005 reacts badly to a cluster giveback ?
Hi,
Yes, the preferred IP address was set (using hostname/IP address pairs). I have no indication of name resolution issues, ie. the LAN interfaces of the filers are statis DNS entries, and the iSCSI IP addresses are set using the filer preferred IP addresses in SnapDrive.
Best regards, Filip
On Tue, Sep 1, 2009 at 1:14 PM, Jack Lyonsjack1729@gmail.com wrote:
Did you set the prefered filer IP address in snapdrive config. Did you setup your iscsi target using hostname or ip's, could there be name resolution issues?
Filip Sneppe wrote:
Hi,
Yes, all hosts are on the same subnet, no jumbo frames are involved, and for iSCSI, single_image mode isn't really relevant...
Best regards, Filip
On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyonsjack1729@gmail.com wrote:
We are about to implement a new SQL 2005 (x64) on NetApp so I will
follow
this thread pretty closely. We have built a few virtual-virtual active-active clusters and virtual-physical active-active clusters for
in
house developed software with no issues.
I assume that the iscsi NIC is on the same segment as the storage? I assume you are not using jumbo frames? I assume your netapp cluster is configure for single image mode?
Raj Patel wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however
in
the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in
terms
of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface
goes
through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a
32bit
VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64
bit
servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08)
from
Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.
Increase those win registry keys...believe me :) also when you'll use the new 20X0 filer that is not so much more faster that the 270 in giveback ;-)
Hosts utilities for iSCSI are not useful. Something good can do for FC environment but in my experience are not a standard to use and, really, maybe I used them just a couple of times when customer asked to have them! ;-) For which concern MPIO on iSCSI is absolutely wasted money for the one that come with MS initiator works very good...and it's for free! All another stuff is on FC. MPIO coming with NetApp software is really a good choice also if it could be expensive.
Bye
Da: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] Per conto di Raj Patel Inviato: domenica 6 settembre 2009 23.20 A: toasters@mathworks.com Oggetto: Re: R: SQL 2005 reacts badly to a cluster giveback ?
Thanks for all the advice - the key take aways would appear to be -
* try the host utilities - they make registry tweaks to key time out values * failover / giveback can be a little slow on the low-end FAS models
As suggested I had a bit of a poke around NOW and came across some interesting stuff
* http://now.netapp.com/NOW/knowledge/docs/mpio/win/reldsm31/html/software/ins...
We don't use MPIO but I'll try the registry tweak.
I'm still unsure wether this was a problem or not under SD 4.2.1 - I don't think it was but these reboots are so infrequent its hard to recall.
We're replacing our 270C with a 2050HA so hopefuly failover speed will improve.
Do people install the Host Utilities as a standard ? The notes seem to relate more towards FC rather than iSCSI setups so I've held off using them.
Cheers, Raj.
On Sun, Sep 6, 2009 at 2:56 AM, Milazzo Giacomo <G.Milazzo@sinergy.itmailto:G.Milazzo@sinergy.it> wrote: Among the other things reported there's another one. 270 (and 20X0) are "slow" filer when they performs a takeover or a giveback. ISCSI hosts (both physical and VMs) are faster for these reasons there are a couple of Windows registry keys to increase the timeouts of iSCSI "hba".
These is well documented on NOW.
Regards,
-----Messaggio originale----- Da: owner-toasters@mathworks.commailto:owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.commailto:owner-toasters@mathworks.com] Per conto di Leimann, Olaf Inviato: martedì 1 settembre 2009 14.50 A: Filip Sneppe; Jack Lyons Cc: Raj Patel; toasters@mathworks.commailto:toasters@mathworks.com Oggetto: RE: SQL 2005 reacts badly to a cluster giveback ?
Hi,
The Hostname/IP Address paris in PreferredIPAddresses in Snapdrive are NOT for the iSCSI traffic, they are meant for the management (RPC) traffic, and they need to be reverse-resolvable and normal network ports on the Filer reachable from the Guest-OS network.
iSCSI traffic is only determined by the setting in the initiator.
For the failover: make sure you use Host Utilities for ESX 5.0R2 or 5.1 and normally the config)hba should be run during the install. A reboot is then required! -- Olaf Leimann
-----Original Message----- From: Filip Sneppe [mailto:filip.sneppe@gmail.commailto:filip.sneppe@gmail.com] Sent: dinsdag 1 september 2009 13:30 To: Jack Lyons Cc: Raj Patel; toasters@mathworks.commailto:toasters@mathworks.com Subject: Re: SQL 2005 reacts badly to a cluster giveback ?
Hi,
Yes, the preferred IP address was set (using hostname/IP address pairs). I have no indication of name resolution issues, ie. the LAN interfaces of the filers are statis DNS entries, and the iSCSI IP addresses are set using the filer preferred IP addresses in SnapDrive.
Best regards, Filip
On Tue, Sep 1, 2009 at 1:14 PM, Jack Lyons<jack1729@gmail.commailto:jack1729@gmail.com> wrote:
Did you set the prefered filer IP address in snapdrive config. Did you setup your iscsi target using hostname or ip's, could there be name resolution issues?
Filip Sneppe wrote:
Hi,
Yes, all hosts are on the same subnet, no jumbo frames are involved, and for iSCSI, single_image mode isn't really relevant...
Best regards, Filip
On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyons<jack1729@gmail.commailto:jack1729@gmail.com> wrote:
We are about to implement a new SQL 2005 (x64) on NetApp so I will follow this thread pretty closely. We have built a few virtual-virtual active-active clusters and virtual-physical active-active clusters for in house developed software with no issues.
I assume that the iscsi NIC is on the same segment as the storage? I assume you are not using jumbo frames? I assume your netapp cluster is configure for single image mode?
Raj Patel wrote:
Hi.
We've had a couple of cluster-failover events on our FAS270c (watchdog errors every time) on 7.2.5.1
The failover is fine (AFAIK) when one of the nodes reboots - however in the Giveback it appears that the SQL server has a couple of initiator errors events logged and although the drives are visible (and working in terms of I/O) and the SQL services are still running any SQL dependent applications just don't work after the giveback. As soon as I stop/start the SQL services its all back to normal (or I reboot the box).
Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes through a dedicated iSCSI NIC (a virtual switch which also carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit servers) but SQL was definitely unhappy (even though the SQL service itself carried on - ie it didn't stop).
Any ideas ? I note theres a newer iSCSI initiator available (2.08) from Microsoft. I'm pretty sure we haven't had this Giveback issue with our old SnapDrive 4.2.1 setup on the same server.
Thanks in advance, Raj.