On Wed, Mar 30, 2016 at 4:31 PM, Parisi, Justin Justin.Parisi@netapp.com wrote:
Oh, well that's different entirely. :)
true =)
The cluster may be out of quorum, which is causing this issue.
I don't think so, my cluster is 6 nodes, "-" 2 nodes being powered off, "+" one of the running nodes has got "epsilon", IMO, it's still "quorum".
Did you capture the aforementioned commands?
"RPC timeout" here means that the API is being sent across the cluster to other nodes via RPC. Since the nodes are down, the commands are failing.
Yes, i did, the healthy nodes reply just fine.
Node Health Eligibility Epsilon -------------------- ------- ------------ ------------ na101node-1a true true false na101node-1b true true false na101node-2a true true true na101node-2b true true false na101node-3a true true false na101node-3b true true false na101node-4a false true false na101node-4b false true false
Keep in mind that a scenario where two nodes in a cluster are powered off is not a normal scenario. If you are doing maintenance, you would want to mark those nodes as "eligibility false" to ensure they don't participate in the cluster during maintenance. You also want to ensure epsilon is not on the nodes and to move epsilon if it is.
Well, I just tried to set "eligibility" to false, but it didn't fix the API calls issue:
::*> node modify -node na101node-4* -eligibility false
Warning: When a node's eligibility is set to "false," it cannot serve SAN data, and NAS access might also be affected. This setting should be used only for unusual maintenance operations. To restore the node's data-serving capabilities, set the eligibility to "true" and reboot the node. Continue? {y|n}: y 2 entries were modified.