ISCSI Issue and VMWare ESX 3.0.1

List overview All Threads
Download

newer

older

Drive cleaning

RE: move snapshots

Justin Brodley

14 Mar 2007 14 Mar '07

9:03 p.m.

I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.

I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address back to the initiator. Is there any way to resolve this from the Netapp perspective?

Thanks in advance.

-Justin

Attachments:

attachment.html (text/html — 3.1 KB)

Show replies by date

Vaughn Stewart

14 Mar 14 Mar

9:29 p.m.

Justin Brodley

9:33 p.m.

Unfortunately I have to connect to ISCSI on both interfaces, (1 port to 1 network ,3 ports aggregated to the other). The problem only occurs with ESX, because ESX is trying to connect to both networks even though its only physically attached to one.

From: Vaughn Stewart [mailto:mvstew@gmail.com] Sent: Wednesday, March 14, 2007 2:29 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1

BY default NetApp enables iSCSI on all Ethernet interfaces. You should disable the interfaces which you do not want to connect via iSCSI on.

Vaughn

Justin Brodley wrote:

Thanks in advance.

-Justin

Vaughn Stewart

9:38 p.m.

Justin Brodley

9:42 p.m.

The filer comes with four ports.

We use 1 for management/legacy iscsi network, the other 3 are aggregated into our "ISCSI network".

We have plans to finally retire the legacy ISCSI network but it is still several months away, which is why we can't resolve this problem by just disabling iscsi on that port.

The ESX server is attempting to use both networks irregardless of if a failover has occurred. I'm not entirely sure why but this is what has been told to us by the vendor.

Justin

From: Vaughn Stewart [mailto:mvstew@gmail.com] Sent: Wednesday, March 14, 2007 2:38 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1

For clarification the Filer has three interfaces? one stand alone and the other three trunked (VIF'd) for redundancy/aggregated throughput?

Justin Brodley wrote:

From: Vaughn Stewart [mailto:mvstew@gmail.com] Sent: Wednesday, March 14, 2007 2:29 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1

BY default NetApp enables iSCSI on all Ethernet interfaces. You should disable the interfaces which you do not want to connect via iSCSI on.

Vaughn

Justin Brodley wrote:

Thanks in advance.

-Justin

Shane Garoutte

22 Mar 22 Mar

9:07 p.m.

We use 1 for management/legacy iscsi network, the other 3 are aggregated into our "ISCSI network".

Justin, can you clarify on the Layer 3 routing and network topology? If the second network that is not physically presented to the ESX server is not routable, then it seems logical this is due to the discovery of the other network when the ISCSI initiator on the ESX server is enumerating the NetApp filer. To confirm this, SSH into the ESX server and run the following:

/usr/sbin/vmkiscsi-ls

Post the output minus and "sensitive" data.

-- View this message in context: http://www.nabble.com/ISCSI-Issue-and-VMWare-ESX-3.0.1-tf3404959.html#a96241... Sent from the Network Appliance - Toasters mailing list archive at Nabble.com.

Blake Golliher

14 Mar 14 Mar

9:48 p.m.

Are these other network interfaces available for failover, or for other networks all together? Is there a way to get the ESX system to log when it chooses to go over another network path?

-Blake

On 3/14/07, Vaughn Stewart mvstew@gmail.com wrote:

...

For clarification the Filer has three interfaces? one stand alone and the other three trunked (VIF'd) for redundancy/aggregated throughput?

Justin Brodley wrote:

Unfortunately I have to connect to ISCSI on both interfaces, (1 port to 1 network ,3 ports aggregated to the other). The problem only occurs with ESX, because ESX is trying to connect to both networks even though its only physically attached to one.

From: Vaughn Stewart [mailto:mvstew@gmail.com] Sent: Wednesday, March 14, 2007 2:29 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1

BY default NetApp enables iSCSI on all Ethernet interfaces. You should disable the interfaces which you do not want to connect via iSCSI on.

Vaughn

Justin Brodley wrote:

I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.

I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address back to the initiator. Is there any way to resolve this from the Netapp perspective?

Thanks in advance.

-Justin

Blake Golliher

9:38 p.m.

Why would the ESX server attempt to connect to the other interface unless it's a failover attempt? Is that what it's doing, trying to keep things going by going though this secondary network?

I'll freely admit windows isn't my forte.

-Blake

On 3/14/07, Justin Brodley jbrodley@sumtotalsystems.com wrote:

...

I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.

I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address back to the initiator. Is there any way to resolve this from the Netapp perspective?

Thanks in advance.

-Justin

Learmonth, Peter

11:01 p.m.

ESX isn't windows. Part of it is RedHat (the service console) but the VMkernel is almost as proprietary as ONTAP. The other 3 versions of VMware (Player, WS, and Server) all have version that run on Windows, but ESX is really it's own OS (with a lot of help from Linux).

ESX iSCSI does the authentication through the service console's IP/network but the actual I/O through the VMkernel. So, on the filer, you'll set two logins, one right after the other, from both IPs.

By default, VMkernel networking is not configured, and you need to set it up for iSCSI, NFS and VMotion. If you create a back end network for iSCSI, it will prompt you if the service console is not set up on that network. However, I've seen people change the networking (remove the service console from the back end after it was all up and running).

Here's what bit me: When you remove the SC from the back-end networking, existing iSCSI connections continue to work, until you reboot or otherwise disconnect. I installed a bunch of patches on a pair of ESX servers configured by somebody else. After I rebooted, iSCSI and all the VMs broke, and I spent several hours trying to figure out what I did to break it. When I disabled and re-enabled iSCSI, it complained about the svc console not being on the back end, and I spent the next 5 minutes smacking myself in the head for not thinking of it sooner.

So, one possibility here is that if they initially set it up with the SC on the back end, then removed it, any reconnect it does will be through the network the SC can see. That's a slightly educated guess.

Enjoy!

Peter

-----Original Message----- From: Blake Golliher [mailto:thelastman@gmail.com] Sent: Wednesday, March 14, 2007 2:38 PM To: Justin Brodley Cc: toasters@mathworks.com Subject: Re: ISCSI Issue and VMWare ESX 3.0.1

Why would the ESX server attempt to connect to the other interface unless it's a failover attempt? Is that what it's doing, trying to keep things going by going though this secondary network?

I'll freely admit windows isn't my forte.

-Blake

On 3/14/07, Justin Brodley jbrodley@sumtotalsystems.com wrote:

...

I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.

I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address

...

back to the initiator. Is there any way to resolve this from the

Netapp perspective?

...

Thanks in advance.

-Justin

Justin Brodley

15 Mar 15 Mar

4:42 p.m.

We have the VMKernel configured to use the same network as ISCSI and still have this problem. I confirmed this last night with our VMWare folks.

Justin

-----Original Message----- From: Learmonth, Peter [mailto:Peter.Learmonth@netapp.com] Sent: Wednesday, March 14, 2007 4:01 PM To: Blake Golliher; Justin Brodley Cc: toasters@mathworks.com Subject: RE: ISCSI Issue and VMWare ESX 3.0.1

ESX iSCSI does the authentication through the service console's IP/network but the actual I/O through the VMkernel. So, on the filer, you'll set two logins, one right after the other, from both IPs.

Enjoy!

Peter

Why would the ESX server attempt to connect to the other interface unless it's a failover attempt? Is that what it's doing, trying to keep things going by going though this secondary network?

I'll freely admit windows isn't my forte.

-Blake

On 3/14/07, Justin Brodley jbrodley@sumtotalsystems.com wrote:

...

I'm currently dealing with a problem on several of our ESX IBM LS21 Blades when trying to attach to ISCSI Luns on the Netapp FAS 3020's. Our Netapp currently connects to two separate physical networks to deliver ISCSI connectivity. The ESX support folks are telling us that the netapp presents both ISCSI interfaces to the server. Initially the ESX box connects on the correct interface, but then after a few hours it attempts to try the other IP address and fails and disconnects the entire VM Host from the Netapp, despite the fact that the network never went down. We have several Windows 2003 servers with ISCSI initiator that don't have this problem on identical hardware and chassis.

I assume that either ESX's iscsi initiator is badly designed, or MS has broken some industry standard spec. To rearchitect our storage network will take significant investment on our part, and we'd rather come up with a way to fix this either by pushing on ESX to fix the initiator or finding a way to have the Netapp only send one IP address

...

back to the initiator. Is there any way to resolve this from the

Netapp perspective?

...

Thanks in advance.

-Justin

Learmonth, Peter

14 Mar 14 Mar

10:07 p.m.

Hi Justin Do you have a network connection for the ESX service console on the back end network (where you want the traffic to go)?

Although it's not supposed to work at all if the service console doesn't have access to the back end network, I've seen it sorta work with weirdness similar to what you're seeing.

Second, and probably more useful, newer versions of ONTAP introduce the concept of portsets. Not sure exactly where this came in, but it's not in 7.0.5 and it is in 7.1.1 . You create a portset, then in the igroup settings, you can assign a portset, and that igroup will only see LUNs through the designated ports of the portset. You can bind to a portset when you create the igroup, or you can bind to it later with the "igroup bind" command. Portsets are created and managed with the "portset" command.

bandit> igroup bind usage: igroup bind <initiator_group> <portset> - binds the igroup to the portset

The initiator group must not be currently bound to any portset If the initiator group is bound, use the 'igroup unbind' command to first unbind the initiator group before attempting to bind to another portset.

For more information, try 'man na_igroup'

bandit> portset The following commands are available; for more information type "portset help <command>" add destroy remove show create help

bandit> portset create portset create: -f or -i must be specified usage: portset create { -f | -i } <portset> [ <filer:port1 filer:port2 ...> ] portset create { -f | -i } <portset> [ <port1 port2 ...> ] - creates a new portset

A portset is a collection of ports. The type is specified with the -f (FCP) or the -i (iSCSI) options (Note only FCP is currently supported). Ports can optionally be supplied, and will be added to the group.

FCP ports are specified by the name of the filer and the port slot letter name separated by a ':' (example filer:4a).

This command also allows the ports to simply be specified by the port slot letter name. Ports specified in this style will add that port from both the local and partner filers at the same time.

A non-empty portset will not be created in a cluster setup if the interconnect between the two filers is down

For more information, try 'man na_portset'

Definitely check out the docs and try it with some non production servers before trying it live!

Let us know how it goes!

Peter

________________________________

From: Justin Brodley [mailto:jbrodley@sumtotalsystems.com] Sent: Wednesday, March 14, 2007 2:03 PM To: toasters@mathworks.com Subject: ISCSI Issue and VMWare ESX 3.0.1

Thanks in advance.

-Justin

6703

Age (days ago)

6711

Last active (days ago)

toasters@lists.teaparty.net

10 comments

5 participants

tags (0)

participants (5)

Blake Golliher
Justin Brodley
Learmonth, Peter
Shane Garoutte
Vaughn Stewart