Hmmmmm,
Going through the steps in the KB, I would have done the epsilon and eligibility steps (Step 1 in the KB) right before the reboot (Step 8), *after* moving the aggregate and the LIFs away from the node to be worked on.
At that point in time it shouldn't disturb anything, since no user traffic should pass through this nodes interfaces or disks.
What do you think? (I'm a little unclear about the meaning of "NFS was restarted", but I have a feeling the above change in sequence should help)
Also, if you look at the revert steps, the KB first restores eligibility and HA failover and only at the end reverts aggregates and LIFs.
Regards
Sebastian
On Mon, Mar 12, 2018, 18:39 Milazzo Giacomo G.Milazzo@sinergy.it wrote:
Hi everybody, past friday during an operation given as NDO we've had a service interruption on NAS component. We had to move the root aggregate from some old disks to new ones and we've literally followed the procedure reported here (our cDOT is 8.3.2P9 on a 4 nodes cluster)
https://kb.netapp.com/app/answers/answer_view/a_id/1030179
In a very simple way it says: A. Check for epsilon on the node you've to migrate and move it to another node A.1 there's a warining about SAN protocols interruptions but we DID NOT have SAN protocols running, only NFS/CIFS. B. Lif migration after the aggregate relocation Well, NFS was restarted and all servers and apps belonging to it went down! I let you imagine customer reaction... Also console after this command: system node modify -node node01 -eligibility false give us a warning about SAN disruption. As I wrote it did not matter us.
Only after that we've found on manual this, but as usual manuals are always less updated than knowledgebase so it could be the last place where to find fresh informations!
https://library.netapp.com/ecmdocs/ECMP1367947/html/GUID-AB52F821-3A25-4E02-...
Moving epsilon for certain manually initiated takeovers Note: Although cluster formation voting can be modified by using the cluster modify -eligibility false command, you should avoid this except for situations such as restoring the node configuration or prolonged node maintenance. If you set a node to be ineligible, it stops serving SAN data until the node is reset to eligible and rebooted. NAS data access to the node might also be affected when the node is ineligible.
And, what does it mean "might be". I translate that as a "nobody knows, try..."
Now the most important question (we must migrate other three nodes!) is this: Assuming that we've well understood that 1. migrate lif and only 2. epsilon false, it there an official answer/doc with updated information that ensure that is this the right procedure to avoid also NAS protocols interruption?
Thank you very much,
Dott. Giacomo Milazzo Senior Consultant & Technical Account Manager mobile: +39 340.6001045 @-mail: g.milazzo@sinergy.it Web: http://www.sinergy.it
SINERGY SpA Viale dei Santi Pietro e Paolo 50 00144 - Roma RM Tel. +39 06 44243674 Fax +39 06 44245272
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters