On 2013-4-15 15:06 , tmac wrote:
totally unsupported.
For starters, the NVRAM is different in these models. Still, head swaps, as far as I know, are not supported during takeover/giveback....
I don't think you understand my intention. What I plan to do is:
- takeover node 1 (fas 6080-1) by node 2 (fas 6080-2). This is a regular, supported operation. - power down node 1 (fas 6080-1). Shouldn't have any impact, because node 2 (fas 6080-2) is taking over the service. - remove node 1 (fas 6080-1) from rack, put replacement node 1 (fas 6290-1) in rack, NOT powered on.
- Now, we shut down node 2 (fas 6080-2). At this point there is a service interruption. Proceed to power down node 2 (fas 6080-2).
Then we bring up the new node 1 (fas 6290-1), standalone (no connection to partner). Reassign disks, make sure interfaces are aligned and OS versions are aligned etc. Then reboot node 1. This should bring back services that are on node 1, now on new hardware (except that it's not HA yet, because the partner hardware isn't there).
If possible, we would now like to do a "takeover" of the as-yet non-existent node 2 on the new node 1, so services that are configured on node 2 will be available again.
We then proceed to remove the old node 2 (fas 6080-2) from the rack, which is off anyway, replace it by the new node 2 (fas 6290-2), make sure cables are properly connected, then boot it and do a giveback on node 1.
I am aware that you cannot replace one node during a takeover, and connect the NVRAM cards of two different hardware types and expect it to work. But that's not what we are trying to do. All we're trying is making use of the existing failover capability to reduce downtime. First there's a 6080->6080 takeover, then the system goes down completely, then there's a 6290<-6290 takeover, and then it's back to normal. Where's the unsupported bit?
Environment variables are set that may be unique also. Been a while since I have checked.
You need to be careful with disk assignments (software ownership) networking interfaces may not line up and will require correction.
I'm aware of these issues, they are addressed in the standard HA pair headswap guide. It just amazes me that the standard guide doesn't give the option to minimize downtime by doing the extra failovers. Maybe that's because most HA configurations these days use one enclosure, so they cannot be taken from the rack separately? (That's obviously not the case for metrocluster configurations, where the heads are in different cabinets by design).
Or is there some information written to disk during a takeover that would prevent new hardware from picking things up? I find that hard to believe, because in case node 1 goes up in flames, it'll have to be replaced with new hardware anyway... so there should be a way to recover from that.