We recently upgraded our SAN ethernet switches and moved from a config where we had one vif with all the nic ports on one switch to a config where we've split the nic ports out across the switches. So we now have two vifs split across the switches. Both vifs are on the same subnet and they have separate ip addresses as our switches don't support cross-switch etherchannelling. We currently have everything going through vif1 and this is working just fine. The issue is starting when I start trying to access the second vif, vif2. I can ping vif2 fine, ssh into vif2 fine but when I try to nfs mount an export on vif2 it hangs, waiting, waiting... then mounts it after about 30 seconds (on linux, solaris hosts don't seem to every mount it). But this only happens from some hosts. Other hosts can mount their exports on vif2 just fine.
I took a tcpdump of the process and what appears is happening is vif1 is responding to requests to vif2, for a while, and then eventually responds out of vif2 and the linux servers is then able to mount the export.
I've tried disabling fastpath and this didn't help.
I then started thinking that I possibly need to have the vifs on seperate subnets but netapp support has verified that it should work fine on the same subnet. I'm not getting anywhere fast with netapp support so thought I'd see if anyone else has any ideas.
Thanks,
It should work fine, but as long as you have two active links within the same subnet, it's totally random which path will be used, since the destination is visible over multiple links.
If your client is expecting packets from server IP 10.0.0.1 and receives it from IP 10.0.0.2, it may drop packets, resulting in hanging mount requests.
If possible, please use different ip subnets or switch to a failover (single vif) configuration.
-Stefan
From: Romeo Theriault [mailto:romeotheriault@gmail.com] Sent: Donnerstag, 1. April 2010 09:59 To: toasters@mathworks.com Subject: routing issue with 2 vifs on same subnet?
We recently upgraded our SAN ethernet switches and moved from a config where we had one vif with all the nic ports on one switch to a config where we've split the nic ports out across the switches. So we now have two vifs split across the switches. Both vifs are on the same subnet and they have separate ip addresses as our switches don't support cross-switch etherchannelling. We currently have everything going through vif1 and this is working just fine. The issue is starting when I start trying to access the second vif, vif2. I can ping vif2 fine, ssh into vif2 fine but when I try to nfs mount an export on vif2 it hangs, waiting, waiting... then mounts it after about 30 seconds (on linux, solaris hosts don't seem to every mount it). But this only happens from some hosts. Other hosts can mount their exports on vif2 just fine.
I took a tcpdump of the process and what appears is happening is vif1 is responding to requests to vif2, for a while, and then eventually responds out of vif2 and the linux servers is then able to mount the export.
I've tried disabling fastpath and this didn't help.
I then started thinking that I possibly need to have the vifs on seperate subnets but netapp support has verified that it should work fine on the same subnet. I'm not getting anywhere fast with netapp support so thought I'd see if anyone else has any ideas.
Thanks,
Hi, Thanks for the response.
On Thu, Apr 1, 2010 at 6:21 PM, Funke, Stefan Stefan.Funke@netapp.comwrote:
It should work fine, but as long as you have two active links within the same subnet, it’s totally random which path will be used, since the destination is visible over multiple links.
You may be correct, but I didn't think this was the case, especially since I read this in the 7.3.2 Network Management Guide:
Balance NFS traffic on network interfaces
You can attach multiple interfaces on your storage system to the same physical network to balance network traffic among different interfaces. For example, if two Ethernet interfaces on a storage system named toaster are attached to the same network where four NFS clients reside, specify in the /etc/fstab file on client1 and client2 that these clients mount from toaster-0:/home. Specify in the /etc/fstab file on client3 and client4 that these clients mount from toaster-1:/home. This scheme can balance the traffic among interfaces if all clients generate about the same amount of traffic. *Your storage system always responds to an NFS request by sending a reply using the interface over which the request was received.*
Thoughts?
Romeo
toaster-0 and toaster-1 are 2 different host names that will resolve to 2 different IP addresses. They can be in the same subnet or different subnets. If your VIFs are on the same filer and subnet and you wish to have your clients talk to them specifically, they have to resolve to 2 different names. This can be done by host files on your client or by DNS entries.
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Romeo Theriault Sent: Thursday, April 01, 2010 5:47 AM To: Funke, Stefan Cc: toasters@mathworks.com Subject: Re: routing issue with 2 vifs on same subnet?
Hi, Thanks for the response.
On Thu, Apr 1, 2010 at 6:21 PM, Funke, Stefan Stefan.Funke@netapp.com wrote:
It should work fine, but as long as you have two active links within the same subnet, it's totally random which path will be used, since the destination is visible over multiple links.
You may be correct, but I didn't think this was the case, especially since I read this in the 7.3.2 Network Management Guide:
Balance NFS traffic on network interfaces You can attach multiple interfaces on your storage system to the same physical network to balance network traffic among different interfaces. For example, if two Ethernet interfaces on a storage system named toaster are attached to the same network where four NFS clients reside, specify in the /etc/fstab file on client1 and client2 that these clients mount from toaster-0:/home. Specify in the /etc/fstab file on client3 and client4 that these clients mount from toaster-1:/home. This scheme can balance the traffic among interfaces if all clients generate about the same amount of traffic. Your storage system always responds to an NFS request by sending a reply using the interface over which the request was received.
Thoughts?
Romeo
On Thu, Apr 1, 2010 at 7:30 PM, Holland, William L HollandWL@state.govwrote:
toaster-0 and toaster-1 are 2 different host names that will resolve to 2 different IP addresses. They can be in the same subnet or different subnets. If your VIFs are on the same filer and subnet and you wish to have your clients talk to them specifically, they have to resolve to 2 different names. This can be done by host files on your client or by DNS entries.
Yes, I have this setup in our dns, and the two vifs have seperate hostnames. The vif1 ip (10.10.2.10) has a hostname of na2 and the vif2 ip (10.10.2.12) has a hostname of na2-vif2. On the hosts I can 'host' and 'nslookup' the hostnames and get the correct responses from dns. I've also tried mounting the exports via the vif2 ip address and have the same results.
just for kicks, have your tried:
route -f
on your filer? if sa default route is set up, it may be the culprit. you may not need the default route on the filer if all you services are on the same subnet.
--tmac Tim McCarthy Principal Consultant RedHat Certified Engineer 804006984323821 (RHEL4) 805007643429572 (RHEL5)
On Thu, Apr 1, 2010 at 7:05 AM, Romeo Theriault romeotheriault@gmail.com wrote:
On Thu, Apr 1, 2010 at 7:30 PM, Holland, William L HollandWL@state.gov wrote:
toaster-0 and toaster-1 are 2 different host names that will resolve to 2 different IP addresses. They can be in the same subnet or different subnets. If your VIFs are on the same filer and subnet and you wish to have your clients talk to them specifically, they have to resolve to 2 different names. This can be done by host files on your client or by DNS entries.
Yes, I have this setup in our dns, and the two vifs have seperate hostnames. The vif1 ip (10.10.2.10) has a hostname of na2 and the vif2 ip (10.10.2.12) has a hostname of na2-vif2. On the hosts I can 'host' and 'nslookup' the hostnames and get the correct responses from dns. I've also tried mounting the exports via the vif2 ip address and have the same results.
On Thu, Apr 1, 2010 at 8:22 PM, tmac tmacmd@gmail.com wrote:
just for kicks, have your tried:
route -f
on your filer? if sa default route is set up, it may be the culprit. you may not need the default route on the filer if all you services are on the same subnet.
Nope, haven't tried this. Since we have some clients that connect to the filers from other subnets I'm not sure that this would be a good idea for us.
I suspect if:
you turn ip.fastpath (back) on and you clear the default route, you may likely be OK.
The filer should only need the default route for its' own services like sending autosupport and if the mail server is on the same subnet, then the default route is really not needed.
When clients come in to a particular interface, ip.fastpath should allow the request to go back out the same way they came in.
might be worth a try to see if that in fact does it. --tmac Tim McCarthy Principal Consultant
RedHat Certified Engineer 804006984323821 (RHEL4) 805007643429572 (RHEL5)
On Thu, Apr 1, 2010 at 8:11 AM, Romeo Theriault romeotheriault@gmail.com wrote:
On Thu, Apr 1, 2010 at 8:22 PM, tmac tmacmd@gmail.com wrote:
just for kicks, have your tried:
route -f
on your filer? if sa default route is set up, it may be the culprit. you may not need the default route on the filer if all you services are on the same subnet.
Nope, haven't tried this. Since we have some clients that connect to the filers from other subnets I'm not sure that this would be a good idea for us.
-- Romeo Theriault
The easiest solution (if possible):
1. Don't have all these subnets routable to and from each other; isolate them. Best practices recommend that they do not route. 2. If you need bandwidth across separate physical NICs, put them into a large multimode VIF and then use VLAN tagging to separate the subnets.
The issue at hand is called "assymmetric routing" (Google it). I actually didn't realize there was a name for it until a couple years ago ;-) When you Google it, the write-ups are pretty interesting readings. It has its place, but I don't think this is not one of them.
Good luck.
Stetson Webster Professional Services Consultant Virtualization and Consolidation NCIE-SAN, NCIE-B&R, SCSN-E, VCP
NetApp 919.250.0052 Direct Phone stetson@netapp.com Learn more: http://www.imaginevirtuallyanything.com
-----Original Message----- From: tmac [mailto:tmacmd@gmail.com] Sent: Thursday, April 01, 2010 8:28 AM To: Romeo Theriault Cc: Holland, William L; Funke, Stefan; toasters@mathworks.com Subject: Re: routing issue with 2 vifs on same subnet?
I suspect if:
you turn ip.fastpath (back) on and you clear the default route, you may likely be OK.
The filer should only need the default route for its' own services like sending autosupport and if the mail server is on the same subnet, then the default route is really not needed.
When clients come in to a particular interface, ip.fastpath should allow the request to go back out the same way they came in.
might be worth a try to see if that in fact does it. --tmac Tim McCarthy Principal Consultant
RedHat Certified Engineer 804006984323821 (RHEL4) 805007643429572 (RHEL5)
On Thu, Apr 1, 2010 at 8:11 AM, Romeo Theriault romeotheriault@gmail.com wrote:
On Thu, Apr 1, 2010 at 8:22 PM, tmac tmacmd@gmail.com wrote:
just for kicks, have your tried:
route -f
on your filer? if sa default route is set up, it may be the culprit. you may not need the default route on the filer if all you services are on the same subnet.
Nope, haven't tried this. Since we have some clients that connect to
the
filers from other subnets I'm not sure that this would be a good idea
for
us.
-- Romeo Theriault
Request For Enhancement:
Cisco routers have the ability to specify "source interface" for stuff like SNMP polling responses, traps, CDP etc. and routing protocol announcements. I think if you could do this for a filer, for management protocols and even particular mounts (all lockd, quotad, mount etc. responses for export X go out interface Y for netgroup Q. ) This would allow the admins to work around this problem when it crops up.
I obviously don't see this asymmetric problem with the actual NFS traffic, but sometimes the supporting protocols like mountd and lockd get confused. Yes, I understand that it shouldn't be an issue because the RPC Layer doesn't really need to care about the IP addresses, but if you're in a shop that runs host based firewalls this leads to interesting and subtle breakage.
All network setups aren't going to be flat, and fast path routing mechanisms do weird things with gateway redundancy protocols like HSRP. Mounting things over routed links is a perfectly legitimate activity and in fact are a good idea until we complete the transition to IPv6 where we can run things on huge flat networks without worrying about broadcast storms and the like. Besides, it's been a long time since routers were software boxes that added anything resembling significant latency or overhead.
~Max
On Apr 1, 2010, at 7:28 AM, Webster, Stetson wrote:
The easiest solution (if possible):
- Don't have all these subnets routable to and from each other;
isolate them. Best practices recommend that they do not route. 2. If you need bandwidth across separate physical NICs, put them into a large multimode VIF and then use VLAN tagging to separate the subnets.
The issue at hand is called "assymmetric routing" (Google it). I actually didn't realize there was a name for it until a couple years ago ;-) When you Google it, the write-ups are pretty interesting readings. It has its place, but I don't think this is not one of them.
Good luck.
Stetson Webster Professional Services Consultant Virtualization and Consolidation NCIE-SAN, NCIE-B&R, SCSN-E, VCP
NetApp 919.250.0052 Direct Phone stetson@netapp.com Learn more: http://www.imaginevirtuallyanything.com
-----Original Message----- From: tmac [mailto:tmacmd@gmail.com] Sent: Thursday, April 01, 2010 8:28 AM To: Romeo Theriault Cc: Holland, William L; Funke, Stefan; toasters@mathworks.com Subject: Re: routing issue with 2 vifs on same subnet?
I suspect if:
you turn ip.fastpath (back) on and you clear the default route, you may likely be OK.
The filer should only need the default route for its' own services like sending autosupport and if the mail server is on the same subnet, then the default route is really not needed.
When clients come in to a particular interface, ip.fastpath should allow the request to go back out the same way they came in.
might be worth a try to see if that in fact does it. --tmac Tim McCarthy Principal Consultant
RedHat Certified Engineer 804006984323821 (RHEL4) 805007643429572 (RHEL5)
On Thu, Apr 1, 2010 at 8:11 AM, Romeo Theriault romeotheriault@gmail.com wrote:
On Thu, Apr 1, 2010 at 8:22 PM, tmac tmacmd@gmail.com wrote:
just for kicks, have your tried:
route -f
on your filer? if sa default route is set up, it may be the culprit. you may not need the default route on the filer if all you services are on the same subnet.
Nope, haven't tried this. Since we have some clients that connect to
the
filers from other subnets I'm not sure that this would be a good idea
for
us.
-- Romeo Theriault
On Thu, Apr 1, 2010 at 9:27 PM, tmac tmacmd@gmail.com wrote:
I suspect if:
you turn ip.fastpath (back) on and you clear the default route, you may likely be OK.
The filer should only need the default route for its' own services like sending autosupport and if the mail server is on the same subnet, then the default route is really not needed.
When clients come in to a particular interface, ip.fastpath should allow the request to go back out the same way they came in.
might be worth a try to see if that in fact does it.
Thanks, might give this a try during a maintenance window.
Do not do a route -f unless you are prepared to break things.
This suggestion is not right, the filer still needs a routing table if its clients are not all directly connected. It is not enough to reply out the same VIF as a request came in. (I'm no filer expert but this is standard TCP/IP networking.)
That's why I would tend to agree that whoever said the docs are wrong must be right that the gaurantee can only be made for the return IP address not for the VIF, though it could be that for many networks the VIF can also be maintained (but not gauranteed).
In my mind, if this is a routing issue it means someone somewhere has the wrong routing table or netmask. Or a routing table is incomplete. Any of these boxes or sub-nets newly configured?
Good luck,
Gerald Justice
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner- toasters@mathworks.com] On Behalf Of tmac Sent: Thursday, April 01, 2010 5:28 AM To: Romeo Theriault Cc: Holland, William L; Funke, Stefan; toasters@mathworks.com Subject: Re: routing issue with 2 vifs on same subnet?
I suspect if:
you turn ip.fastpath (back) on and you clear the default route, you may likely be OK.
The filer should only need the default route for its' own services like sending autosupport and if the mail server is on the same subnet, then the default route is really not needed.
When clients come in to a particular interface, ip.fastpath should allow the request to go back out the same way they came in.
might be worth a try to see if that in fact does it. --tmac Tim McCarthy Principal Consultant
RedHat Certified Engineer 804006984323821 (RHEL4) 805007643429572 (RHEL5)
On Thu, Apr 1, 2010 at 8:11 AM, Romeo Theriault romeotheriault@gmail.com wrote:
On Thu, Apr 1, 2010 at 8:22 PM, tmac tmacmd@gmail.com wrote:
just for kicks, have your tried:
route -f
on your filer? if sa default route is set up, it may be the culprit. you may not need the default route on the filer if all you services are on the same subnet.
Nope, haven't tried this. Since we have some clients that connect to
the
filers from other subnets I'm not sure that this would be a good idea
for
us.
-- Romeo Theriault
On Apr 1, 2010, at 3:30 AM, Holland, William L wrote:
toaster-0 and toaster-1 are 2 different host names that will resolve to 2 different IP addresses. They can be in the same subnet or different subnets. If your VIFs are on the same filer and subnet and you wish to have your clients talk to them specifically, they have to resolve to 2 different names. This can be done by host files on your client or by DNS entries.
While this is true for NFS, I've seen where lockd and some other UDP based protocols would respond from another interface on the same subnet (quotad and SNMP is a good example) . I can currently reproduce with SNMP.
Example
192.168.1.1 - VIF 1 IP
192.168.1.2 VIF 2 IP
mount command against VIF 1 will see a response from VIF2 IP for some of protocols like mount, snmp, The actual *NFS* traffic (port 2049) will always come from VIF1. I don't necessarily see an issue with this, as the RPC mechanisms are higher level than the IP stack, but some OS's might not like this asymmetry.
I can currently reproduce this with SNMP on 7.3.1.1P8.
I pull VIF 1, but the return traffic was coming from VIF2. the host based firewall did not like this, so I just configured the polling box to use VIF2 and it was happy. I would speculate that it'll happen even if the two interfaces aren't vifs, but I haven't tested.
~Max
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Romeo Theriault Sent: Thursday, April 01, 2010 5:47 AM To: Funke, Stefan Cc: toasters@mathworks.com Subject: Re: routing issue with 2 vifs on same subnet?
Hi, Thanks for the response.
On Thu, Apr 1, 2010 at 6:21 PM, Funke, Stefan Stefan.Funke@netapp.com wrote: It should work fine, but as long as you have two active links within the same subnet, it’s totally random which path will be used, since the destination is visible over multiple links. You may be correct, but I didn't think this was the case, especially since I read this in the 7.3.2 Network Management Guide:
Balance NFS traffic on network interfaces You can attach multiple interfaces on your storage system to the same physical network to balance network traffic among different interfaces. For example, if two Ethernet interfaces on a storage system named toaster are attached to the same network where four NFS clients reside, specify in the /etc/fstab file on client1 and client2 that these clients mount from toaster-0:/home. Specify in the /etc/fstab file on client3 and client4 that these clients mount from toaster-1:/home. This scheme can balance the traffic among interfaces if all clients generate about the same amount of traffic. Your storage system always responds to an NFS request by sending a reply using the interface over which the request was received.
Thoughts?
Romeo
While this is true for NFS, I've seen where lockd and some other UDP based protocols would respond from another interface on the same subnet (quotad and SNMP is a good example) . I can currently reproduce with SNMP.
Example
192.168.1.1 - VIF 1 IP
192.168.1.2 VIF 2 IP
mount command against VIF 1 will see a response from VIF2 IP for some of protocols like mount, snmp, The actual *NFS* traffic (port 2049) will always come from VIF1. I don't necessarily see an issue with this, as the RPC mechanisms are higher level than the IP stack, but some OS's might not like this asymmetry.
I can currently reproduce this with SNMP on 7.3.1.1P8.
I pull VIF 1, but the return traffic was coming from VIF2. the host based firewall did not like this, so I just configured the polling box to use VIF2 and it was happy. I would speculate that it'll happen even if the two interfaces aren't vifs, but I haven't tested.
Yes, this is exactly what is happening to me. Trying a nfs mount to vif2 with "proto=tcp" the filers RPC udp mount/port responses come back over vif1 causing about a 30 second hang on mounts on my Suse box and the Solaris boxes never seem to mount it. Though (on Suse) after the export is mounted all traffic goes through the vif2.
Thanks for the RFE, I second it.
I guess putting the vifs on seperate vlans may be the only way to get around this issue...
Note: Disabling the firewall on my linux box didn't seem to help.
Op 1 apr 2010, om 11:46 heeft Romeo Theriault het volgende geschreven:
It should work fine, but as long as you have two active links within the same subnet, it’s totally random which path will be used, since the destination is visible over multiple links.
You may be correct, but I didn't think this was the case, especially since I read this in the 7.3.2 Network Management Guide:
Balance NFS traffic on network interfaces You can attach multiple interfaces on your storage system to the same physical network to balance network traffic among different interfaces. For example, if two Ethernet interfaces on a storage system named toaster are attached to the same network where four NFS clients reside, specify in the /etc/fstab file on client1 and client2 that these clients mount from toaster-0:/home. Specify in the /etc/fstab file on client3 and client4 that these clients mount from toaster-1:/ home. This scheme can balance the traffic among interfaces if all clients generate about the same amount of traffic. Your storage system always responds to an NFS request by sending a reply using the interface over which the request was received.
Thoughts?
That documentation is likely wrong. It should probably read 'IP address' instead of 'interface'.
Thought I'd follow up on this and let folks now that I finally got around to putting the vif2 on another subnet and the problem persisted. Netapp opened a bug on the issue. There isn't anything there yet but here's the link to the bug report:
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=237430.
On Thu, Apr 1, 2010 at 4:59 PM, Romeo Theriault romeotheriault@gmail.comwrote:
We recently upgraded our SAN ethernet switches and moved from a config where we had one vif with all the nic ports on one switch to a config where we've split the nic ports out across the switches. So we now have two vifs split across the switches. Both vifs are on the same subnet and they have separate ip addresses as our switches don't support cross-switch etherchannelling. We currently have everything going through vif1 and this is working just fine. The issue is starting when I start trying to access the second vif, vif2. I can ping vif2 fine, ssh into vif2 fine but when I try to nfs mount an export on vif2 it hangs, waiting, waiting... then mounts it after about 30 seconds (on linux, solaris hosts don't seem to every mount it). But this only happens from some hosts. Other hosts can mount their exports on vif2 just fine.
I took a tcpdump of the process and what appears is happening is vif1 is responding to requests to vif2, for a while, and then eventually responds out of vif2 and the linux servers is then able to mount the export.
I've tried disabling fastpath and this didn't help.
I then started thinking that I possibly need to have the vifs on seperate subnets but netapp support has verified that it should work fine on the same subnet. I'm not getting anywhere fast with netapp support so thought I'd see if anyone else has any ideas.
Thanks,
-- Romeo Theriault