Re: Cluster mode root volume recovery

16 Oct 2013

      Hello Mike:
Glad to hear you got it working.  It was hard to know exactly what you needed but at least the the earlier commands didn't cause any harm.
Definitely worth upgrading to DOT 8.2 if you can.
--April
On Tuesday, October 15, 2013 6:19 PM, Duncan Cummings dcummings@interactive.com.au wrote:
Mike,

You should upgrade if possible.

In 8.1 there is a command

system configuration recovery cluster sync -node node2

which is used for synchronizing a node with a cluster.

Would have made life a lot simpler J

Duncan

From:toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Mike Thompson
Sent: Wednesday, 16 October 2013 10:18 AM
To: April Jenner
Cc: Scott Miller; toasters@teaparty.net Lists
Subject: Re: Cluster mode root volume recovery

Hi April,
This last step, removing /mroot/etc/cluster_config/monitor_mroot.nvfail and a reboot did the trick!
All nodes/aggrs showing up properly, and syslog looks clean.
Thanks very very much!  

On Tue, Oct 15, 2013 at 3:35 PM, April Jenner aprilogi@yahoo.com wrote:
Hello Mike:

You might also need to do the following to clear the flags that caused the node to enter the root volume recover mode. I would highly suggest an upgrade.

--April

1.    Check to see if the bootarg.init.boot_recovery bit is set. From the FreeBSD prompt of your node, type:
§  kenv bootarg.init.boot_recovery
2.    If a value is returned, and not "kenv: unable to get bootarg.init.boot_recovery", then clear the bit. From the FreeBSD prompt of your node, type:
§  sudo sysctl kern.bootargs=--bootarg.init.boot_recovery
3.    Check to see if the bootarg.rdb_corrupt.mgwd is set. From the FreeBSD prompt of your node, type:
§  kenv bootarg.rdb_corrupt.mgwd
4.    If true is returned, and not "kenv: unable to get bootarg.rdb_corrupt.mgwd"', then clear the bit. >From the FreeBSD prompt of your node, type:
§  sudo kenv bootarg.rdb_corrupt.mgwd="false"
5.    Check to see if the monitor_mroot.nvfail file exists. From the FreeBSD prompt of your node, type:
§  ls /mroot/etc/cluster_config/monitor_mroot.nvfail
6.    If the file exists, and you don't get "No such file or directory", then remove it. From the FreeBSD of your node, type:
§  sudo rm /mroot/etc/cluster_config/monitor_mroot.nvfai

On Tuesday, October 15, 2013 2:52 PM, Mike Thompson mike.thompson@gmail.com wrote:
Hey Skottie!
Good to hear from you - been a while!
Unfortunately, though those commands are available in this ancient 8.0.2 release we are running on this cluster, they cannot be run in the state that this particular node is in.
It's in a state kind of like prior to joining or creating a cluster.   There are no volume commands available to be run (even if i explicitly type them out)  It's got the cluster join, cluster create, ping-cluster commands available - it's like it's orphaned itself from the rest of the cluster.   Though if I ping-cluster it sees and can connect to all the other nodes fine.
I'm tempted to try and rejoin it to the cluster via 'cluster join' again, but don't want to possibly screw up the rest of the cluster.  
Thanks for the help!

On Tue, Oct 15, 2013 at 2:09 PM, Scott Miller Scott.Miller@dreamworks.com wrote:
try:
set -priv diag
volume add-other-volumes
which claims it's used to import 7-mode volumes, but I've
used it to re-sync the cluster volume database when
a root volume went walk-about.
*and*
also in priv mode:
volume lost-found show to see what the cluster volume DB
things might be missing.
These commands are in cDOT 8.2P3, not sure in earlier versions.
-skottie
On 10/15/2013 01:47 PM, Mike Thompson wrote:
Hey all,
...
I've got a 8.0.2 c-mode cluster that recently had a single node joined
to it, and a few empty aggregates created on it.   We had a extended
power outtage that required a lot of gear in the data center to get shut
down, and since this node in the cluster didn't have any live data or
VIFs on it, it got shut down.
A few days later, and we are now powering it up, but get this upon login:
"The contents of the root volume may have changed and the local management
databases may be out of sync with the replicated databases due to
corruption of
NVLOG data during takeover. This node is not fully operational. Contact
support
personnel for the root volume recovery procedures."
The node comes up fine, can see all it's aggregates, and the other nodes
in the cluster can see it via the cluster network, but the node is
indeed not fully functional and part of the cluster again.  It's
aggregates and other info are not visible from the other nodes in the
cluster.
Did a wafl_check of the root aggr and vol0 and that came back clean.  I
seem to recall having been through this before in the past, but can't
find anything in my notes.
This particular cluster is not under support, due to some genius
decisions by management, so I'm on my own with this.
There are a few empty aggregates on this node, no volumes other than the
root vol.   Maybe I can force unjoin it from the cluster and rebuild
it?   Would rather not try to do that.  If there is a way to sync up the
dbs on the rootvol so it will come back into the cluster, that would be
ideal.
Any ideas?
Duncan Cummings
NetApp Specialist
Interactive Pty Ltd
Telephone 07 3323 0800
Facsimile 07 3323 0899
Mobile 0403 383 050
www.interactive.com.au
-------Confidentiality & Legal Privilege-------------
"This email is intended for the named recipient only. The information contained in this message may be confidential, or commercially sensitive. If you are not the intended recipient you must not reproduce or distribute any part of the email, disclose its contents
 to any other party, or take any action in reliance on it. If you have received this email in error, please contact the sender immediately. Please delete this message from your computer. Confidentiality and legal privilege are not waived or lost by reason of
 mistaken delivery to you." 
_______________________________________________
...
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: Cluster mode root volume recovery