Will a cluster failover occur if a NIC card fails in one of the clustered filers? I have not been able to find a answer from level 1 tech support at Netapp or from the NOW site? Thank you, Mike Ball
In the immortal words of Mike Ball (mball@rtp.opensystems.com):
Will a cluster failover occur if a NIC card fails in one of the clustered filers? I have not been able to find a answer from level 1 tech support at Netapp or from the NOW site?
From empirical testing: no.
-n
------------------------------------------------------------memory@blank.org "Cyberterrorists may be difficult to capture in the act, but from what I know about people who are highly skilled with computers, they should be easy to beat up." (--The Onion) http://www.blank.org/memory/------------------------------------------------
Will a cluster failover occur if a NIC card fails in one of the
clustered
filers?
No. Generally speaking, cluster failovers occur on "catastrophic" failure events involving the systems core electronics, most of which are located on the system motherboard (processors, memory, core supporting circuitry, that type of thing). Individual I/O adapters such as NIC cards are not really "in the core", but can be protected from failure events via other means, even means that do not require you to go to the additional expense of implementing a full cluster.
For example, "simple" NIC failover capability has been introduced in Data ONTAP 5.3 release ("simple" in that you don't even need to do the full Etherchannel trunking thing anymore to get it! :-)). See:
http://now.netapp.com/knowledge/docs/ontap/rel531/html/sag/net14.htm#1186750
for specific details, or more generally to all the sections on interface configuration in chapter 4 of the 5.3 System Administrators Guide.
Keith
"Keith" == Keith Brown keith@netapp.com writes:
>> > Will a cluster failover occur if a NIC card fails in one of >> the Keith> clustered >> > filers?
Keith> No. Generally speaking, cluster failovers occur on Keith> "catastrophic" failure events involving the systems core Keith> electronics, most of which are located on the system Keith> motherboard (processors, memory, core supporting circuitry, Keith> that type of thing). Individual I/O adapters such as NIC Keith> cards are not really "in the core", but can be protected Keith> from failure events via other means, even means that do not Keith> require you to go to the additional expense of implementing Keith> a full cluster.
Keith> For example, "simple" NIC failover capability has been Keith> introduced in Data ONTAP 5.3 release ("simple" in that you Keith> don't even need to do the full Etherchannel trunking thing Keith> anymore to get it! :-)). See:
Now I'm curious.
If you have two filers clustered and with Gigabit NICs, then each filer has (at least) two Gigabit cards, with one card active and the second card being used only if a takeover occurs.
In this situation, what would you need to do to have redundant NICs? Would you need a third NIC in each filer?
j. -- Jay Soffian jay@cimedia.com UNIX Systems Administrator 404.572.1941 Cox Interactive Media
Now I'm curious.
If you have two filers clustered and with Gigabit NICs, then each filer has (at least) two Gigabit cards, with one card active and the second card being used only if a takeover occurs.
In this situation, what would you need to do to have redundant NICs? Would you need a third NIC in each filer?
That.... is a very interesting fringe case! :-)
Candidly, I don't know for a *fact* if there is anything in our software to stop you trying to "re-use" the secondary gigabit interface in one cluster partner (the one that would take over from the primary gigabit interface from the partner if said partner should fail) as the secondary interface of an in-system single-mode trunk. However, I very much hope that there is, and I have no reason to believe that this case wasn't thought of by our network development team (some real smart folk).
Why do I hope that there is a barrier there? Well, obviously there is a rather-far-out-on-the-fringe case that could stop a cluster from failing over if you attempted such a stunt. Picture a gigabit card failing in one of the systems in a cluster, so the secondary interface of the single-mode trunk (in the same system) takes over from it. Now the partner server fails.... Where is its gigabit interface going to go on the survivor (which is down to its last NIC)? Oh dear....
So in summary... don't do it. Not a good situation! ;-)
Keith
"Keith" == Keith Brown keith@netapp.com writes:
>> Now I'm curious. >> >> If you have two filers clustered and with Gigabit NICs, then >> each filer has (at least) two Gigabit cards, with one card >> active and the second card being used only if a takeover >> occurs. >> >> In this situation, what would you need to do to have redundant >> NICs? Would you need a third NIC in each filer?
Keith> That.... is a very interesting fringe case! :-)
Keith> Candidly, I don't know for a *fact* if there is anything in Keith> our software to stop you trying to "re-use" the secondary Keith> gigabit interface in one cluster partner (the one that Keith> would take over from the primary gigabit interface from the Keith> partner if said partner should fail) as the secondary Keith> interface of an in-system single-mode trunk. However, I Keith> very much hope that there is, and I have no reason to Keith> believe that this case wasn't thought of by our network Keith> development team (some real smart folk).
Keith> Why do I hope that there is a barrier there? Well, Keith> obviously there is a rather-far-out-on-the-fringe case that Keith> could stop a cluster from failing over if you attempted Keith> such a stunt. Picture a gigabit card failing in one of the Keith> systems in a cluster, so the secondary interface of the Keith> single-mode trunk (in the same system) takes over from Keith> it. Now the partner server fails.... Where is its gigabit Keith> interface going to go on the survivor (which is down to its Keith> last NIC)? Oh dear....
Right. Which is why I assumed you would need a third NIC in each filer.
Keith> So in summary... don't do it. Not a good situation! ;-)
Well, I got into thinking about what you'd need for 100% redundancy. The filers are already highly redundant, but there are certain things that can fail that will cause downtime. Clustering two netapps eliminates the head (as a unit) as a single point-of-failure. If a NIC failed, you could always do a manual takeover. But I'd like to see something automatic. So that's how I got to wondering about redundant NICs in each unit in the cluster.
But now that we're off on this tangent... With a clustered netapp pair, what remains as a SPF? Do clustered shelves have any non-redundant components?
j. -- Jay Soffian jay@cimedia.com UNIX Systems Administrator 404.572.1941 Cox Interactive Media
In the immortal words of Jay Soffian (jay@cimedia.com):
But now that we're off on this tangent... With a clustered netapp pair, what remains as a SPF? Do clustered shelves have any non-redundant components?
Just one, as far as I've been able to tell. Unfortunately, it's the power switch, which means that operator error or high-velocity janitors can still take out your entire Filer. :/
-n
------------------------------------------------------------memory@blank.org Indeed, to many of us homefolks, the single greatest irony of the Clinton presidency has been the export of bare-knuckle, eye-gouging Arkansas political mud-rassling to an unexpectedly gullible national press corps. And we thought we were the hayseeds. (--Gene Lyons) http://www.blank.org/memory/------------------------------------------------
Picture a gigabit card failing in one of the systems in a cluster, so the secondary interface of the single-mode trunk (in the same system) takes over from it. Now the partner server fails.... Where is its gigabit interface going to go on the survivor (which is down to its last NIC)? Oh dear....
I don't really see the big problem with that.. You're basically saying that if a NIC group is running in degraded mode, and has no spares, that an additional failure would result in unavailability.. Sounds alot like another scenario that's already been addressed :)
All the clever thigns the toaster could to accomodate nic failures in a clustered pair (doing things like splitting up vif's to accomodate a dead partner) is forcing me back to rtfm.. I'd be interested to see what kinds of neat things people have inplemented in this case. (my cluster is so bland and underutilized that I've got to find something more fun for it to do)..
..kg..
On Fri, 14 May 1999, Jay Soffian wrote:
But now that we're off on this tangent... With a clustered netapp pair, what remains as a SPF? Do clustered shelves have any non-redundant components?
You could conceivably lose an entire shelf. I can't say for sure what might happen, not knowing the hardware internals of an FC-[78] disk shelf, but something between the FC-AL loops and the drives themselves (the backplane, perhaps?) could go wonky. Power, cooling and drives would all still be operational, but the drives would not be visible to either the A or B loops in a cluster. Another variation on the double/multi-drive failure. ;-)