Thanks for that. 

.

On Jul 24, 2015, at 10:32 AM, Gelb, Scott <scott@redeight.com> wrote:

The hot shelf removal feature is great, but the documentation at https://library.netapp.com/ecmdocs/ECMP1367947/html/GUID-2B80FBD2-007D-4D19-8EB1-9CCEED211001.html

is missing a check.  An edge case if you had mroot on those disks prior, and then moved mroot to other disks/shelves for the hot removal.

 

If the disks on the shelf or shelves have mailbox disks, right after you remove ownership from the disks (even though all zeroed spares), the node will crash.  The ha-partner will not be able to takeover.  Then the ha partner will also crash with an inconsistent mailbox.  We had both nodes drop after removing ownership from a shelf that was being removed.  It is an easy fix to get to maintenance mode and mailbox destroy local, but add that to your run book for hot shelf removal.  We didn’t have any data on the ha-pair for the maintenance (the great thing about cDOT which averted any outage even during this maintenance).  We recovered after reassigning the disks and deleting the mailbox (had to do that twice) then both nodes came back up after coredump and mailbox was recreated and storage failover was able to enable again. 

 

::> set adv

::*> stor failover mailbox-disk show

 

 

 

One other note on the move mroot procedure https://kb.netapp.com/support/index?page=content&id=3013873&locale=en_US   The article now lists to turn on nvfail (submitted that update a while ago), but some systems also had create_ucode enabled on vol0, so I check “vol options vol0” prior to move, then match the settings after the move in addition to nvfail.  Also, if you have cdpd.enable on and ip.fastpath.enable off, both of those need to be reset since mroot move sets them back to cdpd off and fastpath on defaults (8.3P1).

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters