I have a call open with netapp about this however I thought I would ask the list if anyone else has had similar issues.
We have a cluster which is connected to our core via TOE-10G Ethernet (T320E-XFP). The problem we see is that the sometimes the netapp appears stop receiving any packets. I haven't done a packet trace while the problem is in effect as I have been a bit of a Muppet when I have noticed the problem as I had forgotten the rlm user name but I have made notes so next time it happens I will do a trace..
This has happened on both units in our cluster. We have replaced one of the 10gig cards and that has made no difference.
Our networking people show no errors on the line cards. Power cycling the units brings the network card back.
We have turned toe off on the cards and that appears not to make any difference to the reliability of the system.
This has so far occurred 3 times which is really very poor.
sysconfig -v 4 slot 4: Dual TOE-10G Ethernet Controller (T320E-XFP) Device Type: CT-31-1 Version Number: T3-SRAM1.1.0-BR1040-20-C0-FW4.6.0-DR03 Serial Number: PT3807035 e4a MAC Address: 00:07:43:05:13:ac (auto-10g_sr-fd-up) e4b MAC Address: 00:07:43:05:13:ad (auto-10g_sr-fd-cfg_down)
ifconfig e4a e4a: flags=948043<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500 inet 172.17.66.26 netmask 0xffffff00 broadcast 172.17.66.255 partner inet 172.17.66.27 (not in use) ether 00:07:43:05:13:ac (auto-10g_sr-fd-up) flowcontrol full
I do not know if the problem still stands, but the older single port card had a problem with the number of simultaneous connections/MAC-address on the card.
Also, what kind of switch? The Cisco 6509 I use currently support an 8-port blade that is 2/1 blocking. I have run a command to administratively disable 4 ports so I have full bandwidth.
--tmac
RedHat Certified Engineer #804006984323821 (RHEL4) RedHat Certified Engineer #805007643429572 (RHEL5)
Principal Consultant
On Tue, Aug 26, 2008 at 8:02 AM, James Beal james_@catbus.co.uk wrote:
I have a call open with netapp about this however I thought I would ask the list if anyone else has had similar issues.
We have a cluster which is connected to our core via TOE-10G Ethernet (T320E-XFP). The problem we see is that the sometimes the netapp appears stop receiving any packets. I haven't done a packet trace while the problem is in effect as I have been a bit of a Muppet when I have noticed the problem as I had forgotten the rlm user name but I have made notes so next time it happens I will do a trace..
This has happened on both units in our cluster. We have replaced one of the 10gig cards and that has made no difference.
Our networking people show no errors on the line cards. Power cycling the units brings the network card back.
We have turned toe off on the cards and that appears not to make any difference to the reliability of the system.
This has so far occurred 3 times which is really very poor.
sysconfig -v 4 slot 4: Dual TOE-10G Ethernet Controller (T320E-XFP) Device Type: CT-31-1 Version Number: T3-SRAM1.1.0-BR1040-20-C0-FW4.6.0-DR03 Serial Number: PT3807035 e4a MAC Address: 00:07:43:05:13:ac (auto-10g_sr-fd-up) e4b MAC Address: 00:07:43:05:13:ad (auto-10g_sr-fd-cfg_down)
ifconfig e4a e4a: flags=948043<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500 inet 172.17.66.26 netmask 0xffffff00 broadcast 172.17.66.255 partner inet 172.17.66.27 (not in use) ether 00:07:43:05:13:ac (auto-10g_sr-fd-up) flowcontrol full
We're using the dual-port 10GbE TOE cards with no issues - that said, TOE is not enabled because we're using VIF (Filer turns TOE off with VIF - not desired by us, but no choice right now as we cannot have SPOF).
Perhaps disabling TOE would get you more reliability?
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of James Beal Sent: Tuesday, August 26, 2008 8:02 AM To: toasters@mathworks.com Subject: issues with 10G Ethernet.
I have a call open with netapp about this however I thought I would ask the list if anyone else has had similar issues.
We have a cluster which is connected to our core via TOE-10G Ethernet (T320E-XFP). The problem we see is that the sometimes the netapp appears stop receiving any packets. I haven't done a packet trace while the problem is in effect as I have been a bit of a Muppet when I have noticed the problem as I had forgotten the rlm user name but I have made notes so next time it happens I will do a trace..
This has happened on both units in our cluster. We have replaced one of the 10gig cards and that has made no difference.
Our networking people show no errors on the line cards. Power cycling the units brings the network card back.
We have turned toe off on the cards and that appears not to make any difference to the reliability of the system.
This has so far occurred 3 times which is really very poor.
sysconfig -v 4 slot 4: Dual TOE-10G Ethernet Controller (T320E-XFP) Device Type: CT-31-1 Version Number: T3-SRAM1.1.0-BR1040-20-C0-FW4.6.0-DR03 Serial Number: PT3807035 e4a MAC Address: 00:07:43:05:13:ac (auto-10g_sr-fd-up) e4b MAC Address: 00:07:43:05:13:ad (auto-10g_sr-fd-cfg_down)
ifconfig e4a e4a: flags=948043<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500 inet 172.17.66.26 netmask 0xffffff00 broadcast 172.17.66.255 partner inet 172.17.66.27 (not in use) ether 00:07:43:05:13:ac (auto-10g_sr-fd-up) flowcontrol full
We are doing the same for our NFS mounted VMs (a couple of hundred) with no real issues. It would be nice of the TOE functionality worked with VIFs though. 7.2.4 here. Are the bugs significant enough to warrant an upgrade if we've been stable for months?
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Glenn Walker Sent: Wednesday, August 27, 2008 8:48 AM To: James Beal; toasters@mathworks.com Subject: RE: issues with 10G Ethernet.
We're using the dual-port 10GbE TOE cards with no issues - that said, TOE is not enabled because we're using VIF (Filer turns TOE off with VIF - not desired by us, but no choice right now as we cannot have SPOF).
Perhaps disabling TOE would get you more reliability?
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of James Beal Sent: Tuesday, August 26, 2008 8:02 AM To: toasters@mathworks.com Subject: issues with 10G Ethernet.
I have a call open with netapp about this however I thought I would ask the list if anyone else has had similar issues.
We have a cluster which is connected to our core via TOE-10G Ethernet (T320E-XFP). The problem we see is that the sometimes the netapp appears stop receiving any packets. I haven't done a packet trace while the problem is in effect as I have been a bit of a Muppet when I have noticed the problem as I had forgotten the rlm user name but I have made notes so next time it happens I will do a trace..
This has happened on both units in our cluster. We have replaced one of the 10gig cards and that has made no difference.
Our networking people show no errors on the line cards. Power cycling the units brings the network card back.
We have turned toe off on the cards and that appears not to make any difference to the reliability of the system.
This has so far occurred 3 times which is really very poor.
sysconfig -v 4 slot 4: Dual TOE-10G Ethernet Controller (T320E-XFP) Device Type: CT-31-1 Version Number: T3-SRAM1.1.0-BR1040-20-C0-FW4.6.0-DR03 Serial Number: PT3807035 e4a MAC Address: 00:07:43:05:13:ac (auto-10g_sr-fd-up) e4b MAC Address: 00:07:43:05:13:ad (auto-10g_sr-fd-cfg_down)
ifconfig e4a e4a: flags=948043<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500 inet 172.17.66.26 netmask 0xffffff00 broadcast 172.17.66.255 partner inet 172.17.66.27 (not in use) ether 00:07:43:05:13:ac (auto-10g_sr-fd-up) flowcontrol full
This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.
We're targeting 7.3.X as it becomes more stable to get the NFS performance benefits. That said, I'm not going to upgrade for a while and we're happy with 7.2.4 so far.
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Page, Jeremy Sent: Wednesday, August 27, 2008 9:26 AM To: toasters@mathworks.com Subject: RE: issues with 10G Ethernet.
We are doing the same for our NFS mounted VMs (a couple of hundred) with no real issues. It would be nice of the TOE functionality worked with VIFs though. 7.2.4 here. Are the bugs significant enough to warrant an upgrade if we've been stable for months?
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Glenn Walker Sent: Wednesday, August 27, 2008 8:48 AM To: James Beal; toasters@mathworks.com Subject: RE: issues with 10G Ethernet.
We're using the dual-port 10GbE TOE cards with no issues - that said, TOE is not enabled because we're using VIF (Filer turns TOE off with VIF - not desired by us, but no choice right now as we cannot have SPOF).
Perhaps disabling TOE would get you more reliability?
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of James Beal Sent: Tuesday, August 26, 2008 8:02 AM To: toasters@mathworks.com Subject: issues with 10G Ethernet.
I have a call open with netapp about this however I thought I would ask the list if anyone else has had similar issues.
We have a cluster which is connected to our core via TOE-10G Ethernet (T320E-XFP). The problem we see is that the sometimes the netapp appears stop receiving any packets. I haven't done a packet trace while the problem is in effect as I have been a bit of a Muppet when I have noticed the problem as I had forgotten the rlm user name but I have made notes so next time it happens I will do a trace..
This has happened on both units in our cluster. We have replaced one of the 10gig cards and that has made no difference.
Our networking people show no errors on the line cards. Power cycling the units brings the network card back.
We have turned toe off on the cards and that appears not to make any difference to the reliability of the system.
This has so far occurred 3 times which is really very poor.
sysconfig -v 4 slot 4: Dual TOE-10G Ethernet Controller (T320E-XFP) Device Type: CT-31-1 Version Number: T3-SRAM1.1.0-BR1040-20-C0-FW4.6.0-DR03 Serial Number: PT3807035 e4a MAC Address: 00:07:43:05:13:ac (auto-10g_sr-fd-up) e4b MAC Address: 00:07:43:05:13:ad (auto-10g_sr-fd-cfg_down)
ifconfig e4a e4a: flags=948043<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500 inet 172.17.66.26 netmask 0xffffff00 broadcast 172.17.66.255 partner inet 172.17.66.27 (not in use) ether 00:07:43:05:13:ac (auto-10g_sr-fd-up) flowcontrol full
This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.
James Beal wrote:
I have a call open with netapp about this however I thought I would ask the list if anyone else has had similar issues.
We have a cluster which is connected to our core via TOE-10G Ethernet (T320E-XFP). The problem we see is that the sometimes the netapp appears stop receiving any packets. I haven't done a packet trace while the problem is in effect as I have been a bit of a Muppet when I have noticed the problem as I had forgotten the rlm user name but I have made notes so next time it happens I will do a trace..
Thanks to Trey Layton, Matt Brown and tmac ( and the other Matt who didn't give a last name ).
1) The most important thing to do is to ensure you are running 7.2.5.1 or later as there is a large number of bugs in the "drivers" for the 10G Ethernet card. 2) If you use Xenpacks to connect your system in your switch then there are some interesting potential issues.
We have just upgrade this system from 7.2.4 to 7.2.5.1 and I will post again if we do have anymore issues.
On 27 Aug 2008, at 14:09, James Beal wrote:
James Beal wrote:
I have a call open with netapp about this however I thought I would ask the list if anyone else has had similar issues.
We have a cluster which is connected to our core via TOE-10G Ethernet (T320E-XFP). The problem we see is that the sometimes the netapp appears stop receiving any packets. I haven't done a packet trace while the problem is in effect as I have been a bit of a Muppet when I have noticed the problem as I had forgotten the rlm user name but I have made notes so next time it happens I will do a trace..
Thanks to Trey Layton, Matt Brown and tmac ( and the other Matt who didn't give a last name ).
- The most important thing to do is to ensure you are running 7.2.5.1
or later as there is a large number of bugs in the "drivers" for the 10G Ethernet card. 2) If you use Xenpacks to connect your system in your switch then there are some interesting potential issues.
We have just upgrade this system from 7.2.4 to 7.2.5.1 and I will post again if we do have anymore issues.
We did have a repeat of the problem running 7.2.5.1 however we have disabled TCP offload and the systems have been running stably for 22/23 days.
-- james
What an unfortunate fix :(
I'd very much like to use TOE on my 10gig VIF, does anyone know if this is ever going to be supported, or is there an architectual reason why it won't work?
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of James Beal Sent: Monday, September 29, 2008 7:00 AM To: James Beal Cc: toasters@mathworks.com Subject: Re: issues with 10G Ethernet.
On 27 Aug 2008, at 14:09, James Beal wrote:
James Beal wrote:
I have a call open with netapp about this however I thought I would ask the list if anyone else has had similar issues.
We have a cluster which is connected to our core via TOE-10G Ethernet (T320E-XFP). The problem we see is that the sometimes the netapp appears stop receiving any packets. I haven't done a packet trace while the problem is in effect as I have been a bit of a Muppet when I have noticed the problem as I had forgotten the rlm user name but I have made notes so next time it happens I will do a trace..
Thanks to Trey Layton, Matt Brown and tmac ( and the other Matt who didn't give a last name ).
- The most important thing to do is to ensure you are running 7.2.5.1
or later as there is a large number of bugs in the "drivers" for the 10G Ethernet card. 2) If you use Xenpacks to connect your system in your switch then there are some interesting potential issues.
We have just upgrade this system from 7.2.4 to 7.2.5.1 and I will post again if we do have anymore issues.
We did have a repeat of the problem running 7.2.5.1 however we have disabled TCP offload and the systems have been running stably for 22/23 days.
-- james
This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.