I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.
I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address back to the initiator. Is there any way to resolve this from the Netapp perspective?
Thanks in advance.
-Justin
Unfortunately I have to connect to ISCSI on both interfaces, (1 port to 1 network ,3 ports aggregated to the other). The problem only occurs with ESX, because ESX is trying to connect to both networks even though its only physically attached to one.
From: Vaughn Stewart [mailto:mvstew@gmail.com] Sent: Wednesday, March 14, 2007 2:29 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1
BY default NetApp enables iSCSI on all Ethernet interfaces. You should disable the interfaces which you do not want to connect via iSCSI on.
Vaughn
Justin Brodley wrote:
I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.
I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address back to the initiator. Is there any way to resolve this from the Netapp perspective?
Thanks in advance.
-Justin
The filer comes with four ports.
We use 1 for management/legacy iscsi network, the other 3 are aggregated into our "ISCSI network".
We have plans to finally retire the legacy ISCSI network but it is still several months away, which is why we can't resolve this problem by just disabling iscsi on that port.
The ESX server is attempting to use both networks irregardless of if a failover has occurred. I'm not entirely sure why but this is what has been told to us by the vendor.
Justin
From: Vaughn Stewart [mailto:mvstew@gmail.com] Sent: Wednesday, March 14, 2007 2:38 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1
For clarification the Filer has three interfaces? one stand alone and the other three trunked (VIF'd) for redundancy/aggregated throughput?
Justin Brodley wrote:
Unfortunately I have to connect to ISCSI on both interfaces, (1 port to 1 network ,3 ports aggregated to the other). The problem only occurs with ESX, because ESX is trying to connect to both networks even though its only physically attached to one.
From: Vaughn Stewart [mailto:mvstew@gmail.com] Sent: Wednesday, March 14, 2007 2:29 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1
BY default NetApp enables iSCSI on all Ethernet interfaces. You should disable the interfaces which you do not want to connect via iSCSI on.
Vaughn
Justin Brodley wrote:
I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.
I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address back to the initiator. Is there any way to resolve this from the Netapp perspective?
Thanks in advance.
-Justin
We use 1 for management/legacy iscsi network, the other 3 are aggregated into our "ISCSI network".
Justin, can you clarify on the Layer 3 routing and network topology? If the second network that is not physically presented to the ESX server is not routable, then it seems logical this is due to the discovery of the other network when the ISCSI initiator on the ESX server is enumerating the NetApp filer. To confirm this, SSH into the ESX server and run the following:
/usr/sbin/vmkiscsi-ls
Post the output minus and "sensitive" data.
Are these other network interfaces available for failover, or for other networks all together? Is there a way to get the ESX system to log when it chooses to go over another network path?
-Blake
On 3/14/07, Vaughn Stewart mvstew@gmail.com wrote:
For clarification the Filer has three interfaces? one stand alone and the other three trunked (VIF'd) for redundancy/aggregated throughput?
Justin Brodley wrote:
Unfortunately I have to connect to ISCSI on both interfaces, (1 port to 1 network ,3 ports aggregated to the other). The problem only occurs with ESX, because ESX is trying to connect to both networks even though its only physically attached to one.
From: Vaughn Stewart [mailto:mvstew@gmail.com] Sent: Wednesday, March 14, 2007 2:29 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1
BY default NetApp enables iSCSI on all Ethernet interfaces. You should disable the interfaces which you do not want to connect via iSCSI on.
Vaughn
Justin Brodley wrote:
I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.
I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address back to the initiator. Is there any way to resolve this from the Netapp perspective?
Thanks in advance.
-Justin
Why would the ESX server attempt to connect to the other interface unless it's a failover attempt? Is that what it's doing, trying to keep things going by going though this secondary network?
I'll freely admit windows isn't my forte.
-Blake
On 3/14/07, Justin Brodley jbrodley@sumtotalsystems.com wrote:
I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.
I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address back to the initiator. Is there any way to resolve this from the Netapp perspective?
Thanks in advance.
-Justin
ESX isn't windows. Part of it is RedHat (the service console) but the VMkernel is almost as proprietary as ONTAP. The other 3 versions of VMware (Player, WS, and Server) all have version that run on Windows, but ESX is really it's own OS (with a lot of help from Linux).
ESX iSCSI does the authentication through the service console's IP/network but the actual I/O through the VMkernel. So, on the filer, you'll set two logins, one right after the other, from both IPs.
By default, VMkernel networking is not configured, and you need to set it up for iSCSI, NFS and VMotion. If you create a back end network for iSCSI, it will prompt you if the service console is not set up on that network. However, I've seen people change the networking (remove the service console from the back end after it was all up and running).
Here's what bit me: When you remove the SC from the back-end networking, existing iSCSI connections continue to work, until you reboot or otherwise disconnect. I installed a bunch of patches on a pair of ESX servers configured by somebody else. After I rebooted, iSCSI and all the VMs broke, and I spent several hours trying to figure out what I did to break it. When I disabled and re-enabled iSCSI, it complained about the svc console not being on the back end, and I spent the next 5 minutes smacking myself in the head for not thinking of it sooner.
So, one possibility here is that if they initially set it up with the SC on the back end, then removed it, any reconnect it does will be through the network the SC can see. That's a slightly educated guess.
Enjoy!
Peter
-----Original Message----- From: Blake Golliher [mailto:thelastman@gmail.com] Sent: Wednesday, March 14, 2007 2:38 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1
Why would the ESX server attempt to connect to the other interface unless it's a failover attempt? Is that what it's doing, trying to keep things going by going though this secondary network?
I'll freely admit windows isn't my forte.
-Blake
On 3/14/07, Justin Brodley jbrodley@sumtotalsystems.com wrote:
I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.
I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address
back to the initiator. Is there any way to resolve this from the
Netapp perspective?
Thanks in advance.
-Justin
We have the VMKernel configured to use the same network as ISCSI and still have this problem. I confirmed this last night with our VMWare folks.
Justin
-----Original Message----- From: Learmonth, Peter [mailto:Peter.Learmonth@netapp.com] Sent: Wednesday, March 14, 2007 4:01 PM To: Blake Golliher; Justin Brodley Cc: toasters@mathworks.com Subject: RE: ISCSI Issue and VMWare ESX 3.0.1
ESX isn't windows. Part of it is RedHat (the service console) but the VMkernel is almost as proprietary as ONTAP. The other 3 versions of VMware (Player, WS, and Server) all have version that run on Windows, but ESX is really it's own OS (with a lot of help from Linux).
ESX iSCSI does the authentication through the service console's IP/network but the actual I/O through the VMkernel. So, on the filer, you'll set two logins, one right after the other, from both IPs.
By default, VMkernel networking is not configured, and you need to set it up for iSCSI, NFS and VMotion. If you create a back end network for iSCSI, it will prompt you if the service console is not set up on that network. However, I've seen people change the networking (remove the service console from the back end after it was all up and running).
Here's what bit me: When you remove the SC from the back-end networking, existing iSCSI connections continue to work, until you reboot or otherwise disconnect. I installed a bunch of patches on a pair of ESX servers configured by somebody else. After I rebooted, iSCSI and all the VMs broke, and I spent several hours trying to figure out what I did to break it. When I disabled and re-enabled iSCSI, it complained about the svc console not being on the back end, and I spent the next 5 minutes smacking myself in the head for not thinking of it sooner.
So, one possibility here is that if they initially set it up with the SC on the back end, then removed it, any reconnect it does will be through the network the SC can see. That's a slightly educated guess.
Enjoy!
Peter
-----Original Message----- From: Blake Golliher [mailto:thelastman@gmail.com] Sent: Wednesday, March 14, 2007 2:38 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1
Why would the ESX server attempt to connect to the other interface unless it's a failover attempt? Is that what it's doing, trying to keep things going by going though this secondary network?
I'll freely admit windows isn't my forte.
-Blake
On 3/14/07, Justin Brodley jbrodley@sumtotalsystems.com wrote:
I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.
I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address
back to the initiator. Is there any way to resolve this from the
Netapp perspective?
Thanks in advance.
-Justin
Hi Justin Do you have a network connection for the ESX service console on the back end network (where you want the traffic to go)?
Although it's not supposed to work at all if the service console doesn't have access to the back end network, I've seen it sorta work with weirdness similar to what you're seeing.
Second, and probably more useful, newer versions of ONTAP introduce the concept of portsets. Not sure exactly where this came in, but it's not in 7.0.5 and it is in 7.1.1 . You create a portset, then in the igroup settings, you can assign a portset, and that igroup will only see LUNs through the designated ports of the portset. You can bind to a portset when you create the igroup, or you can bind to it later with the "igroup bind" command. Portsets are created and managed with the "portset" command.
bandit> igroup bind usage: igroup bind <initiator_group> <portset> - binds the igroup to the portset
The initiator group must not be currently bound to any portset If the initiator group is bound, use the 'igroup unbind' command to first unbind the initiator group before attempting to bind to another portset.
For more information, try 'man na_igroup'
bandit> portset The following commands are available; for more information type "portset help <command>" add destroy remove show create help
bandit> portset create portset create: -f or -i must be specified usage: portset create { -f | -i } <portset> [ <filer:port1 filer:port2 ...> ] portset create { -f | -i } <portset> [ <port1 port2 ...> ] - creates a new portset
A portset is a collection of ports. The type is specified with the -f (FCP) or the -i (iSCSI) options (Note only FCP is currently supported). Ports can optionally be supplied, and will be added to the group.
FCP ports are specified by the name of the filer and the port slot letter name separated by a ':' (example filer:4a).
This command also allows the ports to simply be specified by the port slot letter name. Ports specified in this style will add that port from both the local and partner filers at the same time.
A non-empty portset will not be created in a cluster setup if the interconnect between the two filers is down
For more information, try 'man na_portset'
Definitely check out the docs and try it with some non production servers before trying it live!
Let us know how it goes!
Peter
________________________________
From: Justin Brodley [mailto:jbrodley@sumtotalsystems.com] Sent: Wednesday, March 14, 2007 2:03 PM To: toasters@mathworks.com Subject: ISCSI Issue and VMWare ESX 3.0.1
I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.
I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address back to the initiator. Is there any way to resolve this from the Netapp perspective?
Thanks in advance.
-Justin