Toasters,
What's your preference for non-disruptively upgrading a switch based ONTAP 9 cluster - automated NDU or manual (rolling) NDU?
Happy to hear of both positive and negative experiences, if any.
The cluster in question consists of 3 HA pairs so the automated upgrade will default to rolling. The general recommendation is to use the automated procedure but there are concerns about lack of control, especially in the event of issues. Each HA pair in the cluster hosts critical prod workloads.
No access to a test cluster so there isn't much opportunity to build confidence in the automated procedure ahead of time. I am aware of the ability to pause the automated upgrade.
Leaning toward manual at the moment due to lack of exposure to the automated process.
Cheers, Phil
Really like the automated myself.Ā So much better than the old 7-mode days.
Only issue I repeatedly hit is on giveback, aggr giveback will get vetoed due to CIFS sessions.Ā Never understood why it's fine to break CIFS sessions on takeover, but everything comes to a halt on giveback.
Have to go to CLI and force aggr giveback with override-veto switch. Philbert Rupkins philbertrupkins@gmail.com wrote: āToasters,
What's your preference for non-disruptively upgrading a switch based ONTAP 9 cluster - automated NDU or manual (rolling) NDU?
Happy to hear of both positive and negative experiences, if any.
The cluster in question consists of 3 HA pairs so the automated upgrade will default to rolling. The general recommendation is to use the automated procedure but there are concerns about lack of control, especially in the event of issues. Each HA pair in the cluster hosts critical prod workloads.
No access to a test cluster so there isn't much opportunity to build confidence in the automated procedure ahead of time. I am aware of the ability to pause the automated upgrade.
Leaning toward manual at the moment due to lack of exposure to the automated process.
Cheers, Phil _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters%E2%80%9D
Thanks for the info. I'm familiar with the vetoed giveback due to CIFS - we hit that during unplanned failover events. Good to know I can expect that during upgrades as well.
Are you initiating the upgrade from the GUI? Also, when you override the CIFS veto, do you then need to issue a "cluster image resume-update" or resume from the GUI somewhere?
On Thu, Jul 9, 2020 at 3:37 PM Scott Eno cse@hey.com wrote:
Really like the automated myself. So much better than the old 7-mode days.
Only issue I repeatedly hit is on giveback, aggr giveback will get vetoed due to CIFS sessions. Never understood why it's fine to break CIFS sessions on takeover, but everything comes to a halt on giveback.
Have to go to CLI and force aggr giveback with override-veto switch.
Philbert Rupkins philbertrupkins@gmail.com wrote:
Toasters,
What's your preference for non-disruptively upgrading a switch based ONTAP 9 cluster - automated NDU or manual (rolling) NDU?
Happy to hear of both positive and negative experiences, if any.
The cluster in question consists of 3 HA pairs so the automated upgrade will default to rolling. The general recommendation is to use the automated procedure but there are concerns about lack of control, especially in the event of issues. Each HA pair in the cluster hosts critical prod workloads.
No access to a test cluster so there isn't much opportunity to build confidence in the automated procedure ahead of time. I am aware of the ability to pause the automated upgrade.
Leaning toward manual at the moment due to lack of exposure to the automated process.
Cheers, Phil _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
It depends š If you catch it early enough, the automated process seemingly continuously checks the process and pauses until the giveback completes. I missed it one time for an hour (went to lunch). It took between 3-10 minutes but it detected and continued automagically.
Anymore these days, I try to kick it off from the GUI. It is just too easy not to. Not like the old days with the hours-worth of manually checks and the process to upgrade which took 1-2 hours depending on node count.
I have not seen it, but I think there is a check in the GUI to continue if something odd happens. Cannot attest to what ODD may be as I have not personally seen anything.
--tmac
*Tim McCarthy, **Principal Consultant*
*Proud Member of the #NetAppATeam https://twitter.com/NetAppATeam*
*I Blog at TMACsRack https://tmacsrack.wordpress.com/*
On Thu, Jul 9, 2020 at 5:20 PM Philbert Rupkins philbertrupkins@gmail.com wrote:
Thanks for the info. I'm familiar with the vetoed giveback due to CIFS - we hit that during unplanned failover events. Good to know I can expect that during upgrades as well.
Are you initiating the upgrade from the GUI? Also, when you override the CIFS veto, do you then need to issue a "cluster image resume-update" or resume from the GUI somewhere?
On Thu, Jul 9, 2020 at 3:37 PM Scott Eno cse@hey.com wrote:
Really like the automated myself. So much better than the old 7-mode
days.
Only issue I repeatedly hit is on giveback, aggr giveback will get
vetoed due to CIFS sessions. Never understood why it's fine to break CIFS sessions on takeover, but everything comes to a halt on giveback.
Have to go to CLI and force aggr giveback with override-veto switch.
Philbert Rupkins philbertrupkins@gmail.com wrote:
Toasters,
What's your preference for non-disruptively upgrading a switch based ONTAP 9 cluster - automated NDU or manual (rolling) NDU?
Happy to hear of both positive and negative experiences, if any.
The cluster in question consists of 3 HA pairs so the automated upgrade will default to rolling. The general recommendation is to use the automated procedure but there are concerns about lack of control, especially in the event of issues. Each HA pair in the cluster hosts critical prod workloads.
No access to a test cluster so there isn't much opportunity to build confidence in the automated procedure ahead of time. I am aware of the ability to pause the automated upgrade.
Leaning toward manual at the moment due to lack of exposure to the automated process.
Cheers, Phil _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Right, the default wait time between continuing the upgrade is 8 minutes.Ā If the veto happens mid-upgrade and you catch it before the 8 min passes it will continue.Ā If you donāt, the upgrade will pause and youāll need to resume it once the veto has been overridden.
I was always using the GUI, but found it quicker to use cli and bypass the repeated checks and āare you sureā messages. Ā But, yes, GUI is perfectly fine. Ā If you kick off the upgrade from cli, you can still go to the GUI and follow the process, pause, resume, etc. tmac tmacmd@gmail.com wrote: āIt depends š If you catch it early enough, the automated process seemingly continuously checks the process and pauses until the giveback completes. I missed it one time for an hour (went to lunch). It took between 3-10 minutes but it detected and continued automagically.
Anymore these days, I try to kick it off from the GUI. It is just too easy not to. Not like the old days with the hours-worth of manually checks and the process to upgrade which took 1-2 hours depending on node count.
I have not seen it, but I think there is a check in the GUI to continue if something odd happens. Cannot attest to what ODD may be as I have not personally seen anything.
--tmac
Tim McCarthy, Principal Consultant Proud Member of theĀ #NetAppATeam I Blog atĀ TMACsRackā
āOn Thu, Jul 9, 2020 at 5:20 PM Philbert Rupkins philbertrupkins@gmail.com wrote:ā
āThanks for the info.Ā I'm familiar with the vetoed giveback due to
CIFS - we hit that during unplanned failover events.Ā Good to know I
can expect that during upgrades as well.
Are you initiating the upgrade from the GUI?Ā Ā Also, when you override
the CIFS veto, do you then need to issue a "cluster image
resume-update" or resume from the GUI somewhere?
On Thu, Jul 9, 2020 at 3:37 PM Scott Eno cse@hey.com wrote:
Really like the automated myself.Ā So much better than the old 7-mode days.
Only issue I repeatedly hit is on giveback, aggr giveback will get vetoed due to CIFS sessions.Ā Never understood why it's fine to break CIFS sessions on takeover, but everything comes to a halt on giveback.
Have to go to CLI and force aggr giveback with override-veto switch.
Philbert Rupkins philbertrupkins@gmail.com wrote:
Toasters,
What's your preference for non-disruptively upgrading a switch based
ONTAP 9 cluster - automated NDU or manual (rolling) NDU?
Happy to hear of both positive and negative experiences, if any.
The cluster in question consists of 3 HA pairs so the automated
upgrade will default to rolling. The general recommendation is to use
the automated procedure but there are concerns about lack of control,
especially in the event of issues. Each HA pair in the cluster hosts
critical prod workloads.
No access to a test cluster so there isn't much opportunity to build
confidence in the automated procedure ahead of time. I am aware of
the ability to pause the automated upgrade.
Leaning toward manual at the moment due to lack of exposure to the
automated process.
Cheers,
Phil
Toasters mailing list
Toasters@teaparty.net
_______________________________________________
Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters%E2%80%9D
"Philbert" == Philbert Rupkins philbertrupkins@gmail.com writes:
Philbert> Thanks for the info. I'm familiar with the vetoed giveback Philbert> due to CIFS - we hit that during unplanned failover events. Philbert> Good to know I can expect that during upgrades as well.
I did an upgrade (see my questions from Jan/Feb time) of 8.3 to 9.3 going through 9.1 and it was smooth sailing from the CLI. Super nice and easy. I really liked how well the upgrade process works now as compared to the old 8.1 -> 8.3 cDOT upgrade I did, as well as other 7-mode upgrades in the past.
I'm a CLI guy (heh, nearly wrote gui there) so I just do it from a screen session inside xterm and keep alot of history. We did a big ESX hardware upgrade at the same time, so all my main production loads were shutdown, but honestly, OnTap is so rock solid for regular NFS and even CIFS loads that I'd be ballsy and just go for it.
It all depends on your management's comfort level. Can you show them that failovers and pretty much transparent today to give them confidence?
Which brings me to my big rant, which is failure testing. Too many sites/people are scared to do testing, or make any changes. If you have a robust system, which you expect to be HA, then you need to *test* it to be sure, and to make sure you know the right proceedures in case of problems.
Otherwise, you don't know and can't trust your setup. Which is why I really love the Netflix Simian Army stuff. I just wish I could get more of the team I work with to understand this idea. Test for failures under realistic conditions or you won't know.
Philbert> Are you initiating the upgrade from the GUI? Also, when you Philbert> override the CIFS veto, do you then need to issue a "cluster Philbert> image resume-update" or resume from the GUI somewhere?
Philbert> On Thu, Jul 9, 2020 at 3:37 PM Scott Eno cse@hey.com wrote:
Really like the automated myself. So much better than the old 7-mode days.
Only issue I repeatedly hit is on giveback, aggr giveback will get vetoed due to CIFS sessions. Never understood why it's fine to break CIFS sessions on takeover, but everything comes to a halt on giveback.
Have to go to CLI and force aggr giveback with override-veto switch.
Philbert Rupkins philbertrupkins@gmail.com wrote:
Toasters,
What's your preference for non-disruptively upgrading a switch based ONTAP 9 cluster - automated NDU or manual (rolling) NDU?
Happy to hear of both positive and negative experiences, if any.
The cluster in question consists of 3 HA pairs so the automated upgrade will default to rolling. The general recommendation is to use the automated procedure but there are concerns about lack of control, especially in the event of issues. Each HA pair in the cluster hosts critical prod workloads.
No access to a test cluster so there isn't much opportunity to build confidence in the automated procedure ahead of time. I am aware of the ability to pause the automated upgrade.
Leaning toward manual at the moment due to lack of exposure to the automated process.
Cheers, Phil _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Philbert> _______________________________________________ Philbert> Toasters mailing list Philbert> Toasters@teaparty.net Philbert> https://www.teaparty.net/mailman/listinfo/toasters
"It all depends on your management's comfort level. Can you show them that failovers and pretty much transparent today to give them confidence?"
I can, and the few unexpected failovers we've experienced have been smooth. So that is a confidence boost. However, these particular systems host some of the most critical workloads for the business so there is always hesitation when it's time to upgrade or make a significant change.
Regarding your rant (as you put it), we should absolutely get into the habit of failure testing. I can give you 7,000 excuses as to why we don't but it essentially boils down to priorities. I need to start pushing this idea again.
Also, massive thanks for pointing me to your previous upgrade related questions. The guidance to upgrade disk/shelf/sp/bmc ahead of the upgrade reinforces current plans.
Speaking of which, is there a copy of the SP and BMC Firmware / ONTAP Support Matrix that still lists ONTAP 9.2? We're upgrading from 9.2 to 9.3 and I'd like to upgrade the cluster BMC and SP firmware to the versions bundled with ONTAP 9.3 ahead of time. However, the matrix does not contain compatibility info for 9.2 so I'm left to guess. Link below for reference.
https://mysupport.netapp.com/NOW/download/tools/serviceimage/support/Service...
ONTAP 9.2 is on limited support until the end of the month so it was a bit surprising to see it missing from the firmware matrix.
On Fri, Jul 10, 2020 at 8:05 PM John Stoffel john@stoffel.org wrote:
"Philbert" == Philbert Rupkins philbertrupkins@gmail.com writes:
Philbert> Thanks for the info. I'm familiar with the vetoed giveback Philbert> due to CIFS - we hit that during unplanned failover events. Philbert> Good to know I can expect that during upgrades as well.
I did an upgrade (see my questions from Jan/Feb time) of 8.3 to 9.3 going through 9.1 and it was smooth sailing from the CLI. Super nice and easy. I really liked how well the upgrade process works now as compared to the old 8.1 -> 8.3 cDOT upgrade I did, as well as other 7-mode upgrades in the past.
I'm a CLI guy (heh, nearly wrote gui there) so I just do it from a screen session inside xterm and keep alot of history. We did a big ESX hardware upgrade at the same time, so all my main production loads were shutdown, but honestly, OnTap is so rock solid for regular NFS and even CIFS loads that I'd be ballsy and just go for it.
It all depends on your management's comfort level. Can you show them that failovers and pretty much transparent today to give them confidence?
Which brings me to my big rant, which is failure testing. Too many sites/people are scared to do testing, or make any changes. If you have a robust system, which you expect to be HA, then you need to *test* it to be sure, and to make sure you know the right proceedures in case of problems.
Otherwise, you don't know and can't trust your setup. Which is why I really love the Netflix Simian Army stuff. I just wish I could get more of the team I work with to understand this idea. Test for failures under realistic conditions or you won't know.
Philbert> Are you initiating the upgrade from the GUI? Also, when you Philbert> override the CIFS veto, do you then need to issue a "cluster Philbert> image resume-update" or resume from the GUI somewhere?
Philbert> On Thu, Jul 9, 2020 at 3:37 PM Scott Eno cse@hey.com wrote:
Really like the automated myself. So much better than the old 7-mode days.
Only issue I repeatedly hit is on giveback, aggr giveback will get vetoed due to CIFS sessions. Never understood why it's fine to break CIFS sessions on takeover, but everything comes to a halt on giveback.
Have to go to CLI and force aggr giveback with override-veto switch.
Philbert Rupkins philbertrupkins@gmail.com wrote:
Toasters,
What's your preference for non-disruptively upgrading a switch based ONTAP 9 cluster - automated NDU or manual (rolling) NDU?
Happy to hear of both positive and negative experiences, if any.
The cluster in question consists of 3 HA pairs so the automated upgrade will default to rolling. The general recommendation is to use the automated procedure but there are concerns about lack of control, especially in the event of issues. Each HA pair in the cluster hosts critical prod workloads.
No access to a test cluster so there isn't much opportunity to build confidence in the automated procedure ahead of time. I am aware of the ability to pause the automated upgrade.
Leaning toward manual at the moment due to lack of exposure to the automated process.
Cheers, Phil _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Philbert> _______________________________________________ Philbert> Toasters mailing list Philbert> Toasters@teaparty.net Philbert> https://www.teaparty.net/mailman/listinfo/toasters
Let the SP/BMC happen organically! Meaning: ONTAP includes the appropriate release for the version. You cannot "upgrade" the SP/BMC to a newer version if ONTAP does not support it. ONTAP will not let you. You may be able to "force" it, but why bother? It will be upgraded to the (usually) most compatible release after the upgrade. Most of the time, I see: ONTAP: Node 1 upgrade done. Node 2 working on upgrade. While that happens the SP on Node 1 reboots due to automatic upgrade. After Node 2 boots the new ONTAP, it will eventually upgrade its' SP/BMC firmware also.
The best advice anyone can give here: REBOOT YOUR SP/BMC before any upgrade. It makes the FW upgrade A LOT easier!
Also, from what I can tell, if you upgrade to ONTAP 9.3P18 (or newer as the case may be) you should get the most BMC/SP version automatically upgraded.
Disks/Shelves: Install the files AFTER the upgrade. There are a few edge cases these days (like an IOM12 firmware) where it will not upgrade until ONTAP has a certain patch installed. Plus, the upgrade should include the latest Disk/Shelf firmware files when the ONTP release was built.
The one item that is not in any ONTAP release is the Disk Qualification package. You can install/update that any time. Just be sure to read any warnings on the download page for it@
--tmac
*Tim McCarthy, **Principal Consultant*
*Proud Member of the #NetAppATeam https://twitter.com/NetAppATeam*
*I Blog at TMACsRack https://tmacsrack.wordpress.com/*
On Sat, Jul 11, 2020 at 11:54 PM Philbert Rupkins philbertrupkins@gmail.com wrote:
"It all depends on your management's comfort level. Can you show them that failovers and pretty much transparent today to give them confidence?"
I can, and the few unexpected failovers we've experienced have been smooth. So that is a confidence boost. However, these particular systems host some of the most critical workloads for the business so there is always hesitation when it's time to upgrade or make a significant change.
Regarding your rant (as you put it), we should absolutely get into the habit of failure testing. I can give you 7,000 excuses as to why we don't but it essentially boils down to priorities. I need to start pushing this idea again.
Also, massive thanks for pointing me to your previous upgrade related questions. The guidance to upgrade disk/shelf/sp/bmc ahead of the upgrade reinforces current plans.
Speaking of which, is there a copy of the SP and BMC Firmware / ONTAP Support Matrix that still lists ONTAP 9.2? We're upgrading from 9.2 to 9.3 and I'd like to upgrade the cluster BMC and SP firmware to the versions bundled with ONTAP 9.3 ahead of time. However, the matrix does not contain compatibility info for 9.2 so I'm left to guess. Link below for reference.
https://mysupport.netapp.com/NOW/download/tools/serviceimage/support/Service...
ONTAP 9.2 is on limited support until the end of the month so it was a bit surprising to see it missing from the firmware matrix.
On Fri, Jul 10, 2020 at 8:05 PM John Stoffel john@stoffel.org wrote:
> "Philbert" == Philbert Rupkins philbertrupkins@gmail.com writes:
Philbert> Thanks for the info. I'm familiar with the vetoed giveback Philbert> due to CIFS - we hit that during unplanned failover events. Philbert> Good to know I can expect that during upgrades as well.
I did an upgrade (see my questions from Jan/Feb time) of 8.3 to 9.3 going through 9.1 and it was smooth sailing from the CLI. Super nice and easy. I really liked how well the upgrade process works now as compared to the old 8.1 -> 8.3 cDOT upgrade I did, as well as other 7-mode upgrades in the past.
I'm a CLI guy (heh, nearly wrote gui there) so I just do it from a screen session inside xterm and keep alot of history. We did a big ESX hardware upgrade at the same time, so all my main production loads were shutdown, but honestly, OnTap is so rock solid for regular NFS and even CIFS loads that I'd be ballsy and just go for it.
It all depends on your management's comfort level. Can you show them that failovers and pretty much transparent today to give them confidence?
Which brings me to my big rant, which is failure testing. Too many sites/people are scared to do testing, or make any changes. If you have a robust system, which you expect to be HA, then you need to *test* it to be sure, and to make sure you know the right proceedures in case of problems.
Otherwise, you don't know and can't trust your setup. Which is why I really love the Netflix Simian Army stuff. I just wish I could get more of the team I work with to understand this idea. Test for failures under realistic conditions or you won't know.
Philbert> Are you initiating the upgrade from the GUI? Also, when you Philbert> override the CIFS veto, do you then need to issue a "cluster Philbert> image resume-update" or resume from the GUI somewhere?
Philbert> On Thu, Jul 9, 2020 at 3:37 PM Scott Eno cse@hey.com wrote:
Really like the automated myself. So much better than the old 7-mode
days.
Only issue I repeatedly hit is on giveback, aggr giveback will get
vetoed due to CIFS sessions. Never understood why it's fine to break CIFS sessions on takeover, but everything comes to a halt on giveback.
Have to go to CLI and force aggr giveback with override-veto switch.
Philbert Rupkins philbertrupkins@gmail.com wrote:
Toasters,
What's your preference for non-disruptively upgrading a switch based ONTAP 9 cluster - automated NDU or manual (rolling) NDU?
Happy to hear of both positive and negative experiences, if any.
The cluster in question consists of 3 HA pairs so the automated upgrade will default to rolling. The general recommendation is to use the automated procedure but there are concerns about lack of control, especially in the event of issues. Each HA pair in the cluster hosts critical prod workloads.
No access to a test cluster so there isn't much opportunity to build confidence in the automated procedure ahead of time. I am aware of the ability to pause the automated upgrade.
Leaning toward manual at the moment due to lack of exposure to the automated process.
Cheers, Phil _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Philbert> _______________________________________________ Philbert> Toasters mailing list Philbert> Toasters@teaparty.net Philbert> https://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Aside from the possibility that Disk/Shelf FW updates wont autoupdate until at a specific ONTAP patch level, is there any harm in updating the Disk/Shelf FW beforehand? I like the idea of getting as much done ahead of the ONTAP upgrade to reduce moving parts and length of the maintenance window.
On Sun, Jul 12, 2020 at 4:33 PM tmac tmacmd@gmail.com wrote:
Let the SP/BMC happen organically! Meaning: ONTAP includes the appropriate release for the version. You cannot "upgrade" the SP/BMC to a newer version if ONTAP does not support it. ONTAP will not let you. You may be able to "force" it, but why bother? It will be upgraded to the (usually) most compatible release after the upgrade. Most of the time, I see: ONTAP: Node 1 upgrade done. Node 2 working on upgrade. While that happens the SP on Node 1 reboots due to automatic upgrade. After Node 2 boots the new ONTAP, it will eventually upgrade its' SP/BMC firmware also.
The best advice anyone can give here: REBOOT YOUR SP/BMC before any upgrade. It makes the FW upgrade A LOT easier!
Also, from what I can tell, if you upgrade to ONTAP 9.3P18 (or newer as the case may be) you should get the most BMC/SP version automatically upgraded.
Disks/Shelves: Install the files AFTER the upgrade. There are a few edge cases these days (like an IOM12 firmware) where it will not upgrade until ONTAP has a certain patch installed. Plus, the upgrade should include the latest Disk/Shelf firmware files when the ONTP release was built.
The one item that is not in any ONTAP release is the Disk Qualification package. You can install/update that any time. Just be sure to read any warnings on the download page for it@
--tmac
Tim McCarthy, Principal Consultant
Proud Member of the #NetAppATeam
I Blog at TMACsRack
On Sat, Jul 11, 2020 at 11:54 PM Philbert Rupkins philbertrupkins@gmail.com wrote:
"It all depends on your management's comfort level. Can you show them that failovers and pretty much transparent today to give them confidence?"
I can, and the few unexpected failovers we've experienced have been smooth. So that is a confidence boost. However, these particular systems host some of the most critical workloads for the business so there is always hesitation when it's time to upgrade or make a significant change.
Regarding your rant (as you put it), we should absolutely get into the habit of failure testing. I can give you 7,000 excuses as to why we don't but it essentially boils down to priorities. I need to start pushing this idea again.
Also, massive thanks for pointing me to your previous upgrade related questions. The guidance to upgrade disk/shelf/sp/bmc ahead of the upgrade reinforces current plans.
Speaking of which, is there a copy of the SP and BMC Firmware / ONTAP Support Matrix that still lists ONTAP 9.2? We're upgrading from 9.2 to 9.3 and I'd like to upgrade the cluster BMC and SP firmware to the versions bundled with ONTAP 9.3 ahead of time. However, the matrix does not contain compatibility info for 9.2 so I'm left to guess. Link below for reference.
https://mysupport.netapp.com/NOW/download/tools/serviceimage/support/Service...
ONTAP 9.2 is on limited support until the end of the month so it was a bit surprising to see it missing from the firmware matrix.
On Fri, Jul 10, 2020 at 8:05 PM John Stoffel john@stoffel.org wrote:
>> "Philbert" == Philbert Rupkins philbertrupkins@gmail.com writes:
Philbert> Thanks for the info. I'm familiar with the vetoed giveback Philbert> due to CIFS - we hit that during unplanned failover events. Philbert> Good to know I can expect that during upgrades as well.
I did an upgrade (see my questions from Jan/Feb time) of 8.3 to 9.3 going through 9.1 and it was smooth sailing from the CLI. Super nice and easy. I really liked how well the upgrade process works now as compared to the old 8.1 -> 8.3 cDOT upgrade I did, as well as other 7-mode upgrades in the past.
I'm a CLI guy (heh, nearly wrote gui there) so I just do it from a screen session inside xterm and keep alot of history. We did a big ESX hardware upgrade at the same time, so all my main production loads were shutdown, but honestly, OnTap is so rock solid for regular NFS and even CIFS loads that I'd be ballsy and just go for it.
It all depends on your management's comfort level. Can you show them that failovers and pretty much transparent today to give them confidence?
Which brings me to my big rant, which is failure testing. Too many sites/people are scared to do testing, or make any changes. If you have a robust system, which you expect to be HA, then you need to *test* it to be sure, and to make sure you know the right proceedures in case of problems.
Otherwise, you don't know and can't trust your setup. Which is why I really love the Netflix Simian Army stuff. I just wish I could get more of the team I work with to understand this idea. Test for failures under realistic conditions or you won't know.
Philbert> Are you initiating the upgrade from the GUI? Also, when you Philbert> override the CIFS veto, do you then need to issue a "cluster Philbert> image resume-update" or resume from the GUI somewhere?
Philbert> On Thu, Jul 9, 2020 at 3:37 PM Scott Eno cse@hey.com wrote:
Really like the automated myself. So much better than the old 7-mode days.
Only issue I repeatedly hit is on giveback, aggr giveback will get vetoed due to CIFS sessions. Never understood why it's fine to break CIFS sessions on takeover, but everything comes to a halt on giveback.
Have to go to CLI and force aggr giveback with override-veto switch.
Philbert Rupkins philbertrupkins@gmail.com wrote:
Toasters,
What's your preference for non-disruptively upgrading a switch based ONTAP 9 cluster - automated NDU or manual (rolling) NDU?
Happy to hear of both positive and negative experiences, if any.
The cluster in question consists of 3 HA pairs so the automated upgrade will default to rolling. The general recommendation is to use the automated procedure but there are concerns about lack of control, especially in the event of issues. Each HA pair in the cluster hosts critical prod workloads.
No access to a test cluster so there isn't much opportunity to build confidence in the automated procedure ahead of time. I am aware of the ability to pause the automated upgrade.
Leaning toward manual at the moment due to lack of exposure to the automated process.
Cheers, Phil _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Philbert> _______________________________________________ Philbert> Toasters mailing list Philbert> Toasters@teaparty.net Philbert> https://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters