I completely understand. It is *not* a supported operation.

There are *still* NVRAM handshakes that happen and by changing the head you *WILL*

have major problems.

You have head two that has taken over head one. -> YOU HAVE A TAKOEVER OPERATION.

If you shut down and replace the heads, they will likely be expecting NVRAM behavior

from the FAS6080s that are no longer there. You cannot change heads

Why not just:

(and I am sure this very abbreviated from the upgrade guide: LOTS of missing steps!)

cf disable (to kill takeover/giveback operations)

halt -c on one head to shut it down without triggering a takeover

replace the head you just shut down and start it back up and make it work correctly

(ignore mismatch errors for now)

halt -c on the other head

replace the head you just shut down and start it back up and make it work correctly

After you are sure everything is normal/OK -> cf enable to re-nable the cluster

Think about it, your downtime will be minimal. You are only taking down one head at a time.

Please follow the upgrade guides.

--tmac

Tim McCarthy

Principal Consultant

Clustered ONTAP Clustered ONTAP

NCDA ID: XK7R3GEKC1QQ2LVD RHCE5 805007643429572 NCSIE ID: C14QPHE21FR4YWD4

Expires: 08 November 2014 Expires w/release of RHEL7 Expires: 08 November 2014

On Mon, Apr 15, 2013 at 10:14 AM, Jan-Pieter Cornet <johnpc@xs4all.nl> wrote:

On 2013-4-15 15:06 , tmac wrote:
> totally unsupported.
>
> For starters, the NVRAM is different in these models.
> Still, head swaps, as far as I know, are not supported during takeover/giveback....

I don't think you understand my intention. What I plan to do is:

- takeover node 1 (fas 6080-1) by node 2 (fas 6080-2). This is a regular, supported operation.
- power down node 1 (fas 6080-1). Shouldn't have any impact, because node 2 (fas 6080-2) is taking over the service.
- remove node 1 (fas 6080-1) from rack, put replacement node 1 (fas 6290-1) in rack, NOT powered on.

- Now, we shut down node 2 (fas 6080-2). At this point there is a service interruption. Proceed to power down node 2 (fas 6080-2).

Then we bring up the new node 1 (fas 6290-1), standalone (no connection to partner). Reassign disks, make sure interfaces are aligned and OS versions are aligned etc. Then reboot node 1. This should bring back services that are on node 1, now on new hardware (except that it's not HA yet, because the partner hardware isn't there).

If possible, we would now like to do a "takeover" of the as-yet non-existent node 2 on the new node 1, so services that are configured on node 2 will be available again.

We then proceed to remove the old node 2 (fas 6080-2) from the rack, which is off anyway, replace it by the new node 2 (fas 6290-2), make sure cables are properly connected, then boot it and do a giveback on node 1.

I am aware that you cannot replace one node during a takeover, and connect the NVRAM cards of two different hardware types and expect it to work. But that's not what we are trying to do. All we're trying is making use of the existing failover capability to reduce downtime. First there's a 6080->6080 takeover, then the system goes down completely, then there's a 6290<-6290 takeover, and then it's back to normal. Where's the unsupported bit?

> Environment variables are set that may be unique also. Been a while since I have checked.
>
> You need to be careful with disk assignments (software ownership)
> networking interfaces may not line up and will require correction.

I'm aware of these issues, they are addressed in the standard HA pair headswap guide. It just amazes me that the standard guide doesn't give the option to minimize downtime by doing the extra failovers. Maybe that's because most HA configurations these days use one enclosure, so they cannot be taken from the rack separately? (That's obviously not the case for metrocluster configurations, where the heads are in different cabinets by design).

Or is there some information written to disk during a takeover that would prevent new hardware from picking things up? I find that hard to believe, because in case node 1 goes up in flames, it'll have to be replaced with new hardware anyway... so there should be a way to recover from that.

--
Jan-Pieter Cornet
"Most seasonal greetings are sent by spammers and phishers."