Douglas> Did you have a look at fastpath?
I did, and I think I'm all ok because most of my SVMs have just one network associated with them, and I've also moved all my management off to a completely seperate subnet.
Our cluster is stupid simple, just 4 x 10gb LACP trunks from each node (4 nodes total) with a number of VLANs running over those trunks. the SVMs are reasonably designed, though we did make some mistakes many years ago when we first set things up that I would do differently now.
Douglas> https://whyistheinternetbroken.wordpress.com/2018/02/16/ipfastpath-ontap92/ Douglas> It seems every time we put in a case for upgrades to 9.3 Douglas> netapp support tries to make sure we looked into this! So it Douglas> must've bitten a lot of folks.
I think so. I hope we're all set.
Douglas> We did run into a pretty big bug on the upgrade from 9.1 to Douglas> 9.3P15 -- we have a case/core in now. I've seen nfs stop Douglas> serving from a node in at least 3 clusters roughly a 2-5 Douglas> hours after the upgrade. We fix it by indicating the Douglas> unresponsive node and either powering it down, or via NMI/SP. Douglas> It will not respond to normal takeover commands. Preliminary Douglas> core analysis (no full core analysis yet) points at at least Douglas> 1 bug fixed in 9.3P17.
Yikes! That is a big bug to have to deal with.
Douglas> https://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1236722
Douglas> Typically when we roll updates it can take months given the Douglas> number of nodes and clusters. So we stick with whatever P Douglas> patch we rolled on the first set of nodes, and then by the Douglas> end of upgrades 1-3 P patches are released.
You have a much bigger environment than we have! I think next time we might do smaller pairs of nodes, so we can just VMotion VMs back and forth and hopefully have enough space to be a bit more proactive on upgrades without disrupting everything with a full shutdown.
Douglas> With this experience, always use the latest P patch possible Douglas> on the intermediary update especially if you are going to Douglas> take a bit to roll it through your entire deployment. I also Douglas> recommend taking a look at going to 9.5, it sounds nuts, but Douglas> we've had better stability with this release. We move to this Douglas> release because of a specific feature that was needed Douglas> (CIFS/SMB enhancements, and flexcache/flexgroups).
I had thought of going that far up, but just getting the downtime for the two jumps I need to do has been hard enough. But we're learning our lesson and trying to do upgrades more frequently. We only have this one cluster though and it runs everything.
Douglas> On Tue, Mar 3, 2020 at 4:41 PM John Stoffel john@stoffel.org wrote:
Douglas> Guys,
Douglas> We're getting ready to ugprade our 4 node 8060 cluster from 8.3.2P9 to Douglas> 9.1P19 and then onto 9.3P17, all this weekend. My only real concern Douglas> is the upgrade from 9.1 to 9.3, which lists a major warning for bug Douglas> 1250500:
Douglas> Expired truststore security certificates causing upgrade and Douglas> new installation failures.
Douglas> Unfortunately, I can't run an upgrade advisor report for my cluster Douglas> going from 9.1P19 to 9.3P17, because I'm not yet running 9.1, and it Douglas> can take upto a week for the autosupport data to get pushed to Upgrade Douglas> Advisor. Sigh...
Douglas> Has anyone run into this issue when doing the 9.1 -> 9.3 upgrade with Douglas> the expired certificate? Otherwise, it all looks good, my cluster Douglas> switches are supported at their current version, etc.
Douglas> John Douglas> _______________________________________________ Douglas> Toasters mailing list Douglas> Toasters@teaparty.net Douglas> http://www.teaparty.net/mailman/listinfo/toasters