"It all depends on your management's comfort level. Can you show them
that failovers and pretty much transparent today to give them
confidence?"
I can, and the few unexpected failovers we've experienced have been
smooth. So that is a confidence boost. However, these particular
systems host some of the most critical workloads for the business so
there is always hesitation when it's time to upgrade or make a
significant change.
Regarding your rant (as you put it), we should absolutely get into the
habit of failure testing. I can give you 7,000 excuses as to why we
don't but it essentially boils down to priorities. I need to start
pushing this idea again.
Also, massive thanks for pointing me to your previous upgrade related
questions. The guidance to upgrade disk/shelf/sp/bmc ahead of the
upgrade reinforces current plans.
Speaking of which, is there a copy of the SP and BMC Firmware / ONTAP
Support Matrix that still lists ONTAP 9.2? We're upgrading from 9.2
to 9.3 and I'd like to upgrade the cluster BMC and SP firmware to the
versions bundled with ONTAP 9.3 ahead of time. However, the matrix
does not contain compatibility info for 9.2 so I'm left to guess.
Link below for reference.
https://mysupport.netapp.com/NOW/download/tools/serviceimage/support/ServiceProcessorSupportMatrix.shtml
ONTAP 9.2 is on limited support until the end of the month so it was a
bit surprising to see it missing from the firmware matrix.
On Fri, Jul 10, 2020 at 8:05 PM John Stoffel <john@stoffel.org> wrote:
>
> >>>>> "Philbert" == Philbert Rupkins <philbertrupkins@gmail.com> writes:
>
> Philbert> Thanks for the info. I'm familiar with the vetoed giveback
> Philbert> due to CIFS - we hit that during unplanned failover events.
> Philbert> Good to know I can expect that during upgrades as well.
>
> I did an upgrade (see my questions from Jan/Feb time) of 8.3 to 9.3
> going through 9.1 and it was smooth sailing from the CLI. Super nice
> and easy. I really liked how well the upgrade process works now as
> compared to the old 8.1 -> 8.3 cDOT upgrade I did, as well as other
> 7-mode upgrades in the past.
>
> I'm a CLI guy (heh, nearly wrote gui there) so I just do it from a
> screen session inside xterm and keep alot of history. We did a big
> ESX hardware upgrade at the same time, so all my main production loads
> were shutdown, but honestly, OnTap is so rock solid for regular NFS
> and even CIFS loads that I'd be ballsy and just go for it.
>
> It all depends on your management's comfort level. Can you show them
> that failovers and pretty much transparent today to give them
> confidence?
>
> Which brings me to my big rant, which is failure testing. Too many
> sites/people are scared to do testing, or make any changes. If you
> have a robust system, which you expect to be HA, then you need to
> *test* it to be sure, and to make sure you know the right proceedures
> in case of problems.
>
> Otherwise, you don't know and can't trust your setup. Which is why I
> really love the Netflix Simian Army stuff. I just wish I could get
> more of the team I work with to understand this idea. Test for
> failures under realistic conditions or you won't know.
>
>
> Philbert> Are you initiating the upgrade from the GUI? Also, when you
> Philbert> override the CIFS veto, do you then need to issue a "cluster
> Philbert> image resume-update" or resume from the GUI somewhere?
>
> Philbert> On Thu, Jul 9, 2020 at 3:37 PM Scott Eno <cse@hey.com> wrote:
> >>
> >> Really like the automated myself. So much better than the old 7-mode days.
> >>
> >> Only issue I repeatedly hit is on giveback, aggr giveback will get vetoed due to CIFS sessions. Never understood why it's fine to break CIFS sessions on takeover, but everything comes to a halt on giveback.
> >>
> >> Have to go to CLI and force aggr giveback with override-veto switch.
> >>
> >> Philbert Rupkins <philbertrupkins@gmail.com> wrote:
> >>
> >> Toasters,
> >>
> >> What's your preference for non-disruptively upgrading a switch based
> >> ONTAP 9 cluster - automated NDU or manual (rolling) NDU?
> >>
> >> Happy to hear of both positive and negative experiences, if any.
> >>
> >> The cluster in question consists of 3 HA pairs so the automated
> >> upgrade will default to rolling. The general recommendation is to use
> >> the automated procedure but there are concerns about lack of control,
> >> especially in the event of issues. Each HA pair in the cluster hosts
> >> critical prod workloads.
> >>
> >> No access to a test cluster so there isn't much opportunity to build
> >> confidence in the automated procedure ahead of time. I am aware of
> >> the ability to pause the automated upgrade.
> >>
> >> Leaning toward manual at the moment due to lack of exposure to the
> >> automated process.
> >>
> >> Cheers,
> >> Phil
> >> _______________________________________________
> >> Toasters mailing list
> >> Toasters@teaparty.net
> >> https://www.teaparty.net/mailman/listinfo/toasters
> Philbert> _______________________________________________
> Philbert> Toasters mailing list
> Philbert> Toasters@teaparty.net
> Philbert> https://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
https://www.teaparty.net/mailman/listinfo/toasters