Hi all,
We're in the middle of hell, where our 4-node FAS8060 cluster was shutdown cleanly for a move, but only one pair made it onto the truck to the new DC. Luckily I have all the volumes snapmirrored between the two pairs of nodes and their aggregates.
But now I need to bring up the pair that made the trip, figure out which mirrors are source and which are destination on this pair, and then break the destination ones so I can promote them to read-write.
This is not something I've practiced, and I wonder that if I have volume foo, mounted on /foo, and it's snapmirror is volume foo_sm, when I do the break, will it automatically mount to /foo? I guess I'll find out later tonight, and I can just unmount and remount.
I think this is all good with just a simple 'snapmirror break ...' but then when we get the chance to rejoin the other two nodes into the cluster down the line, I would asusme I just have to (maybe) wipe the old nodes and rejoin them one at a time. Mostly because by that point I can't have the original source volumes come up and cause us to lose all the writes that have happened on the now writeable destination volumes.
And of course there's the matter of getting epsilon back up and working on the two node cluster when I reboot it. Along with all the LIFs, etc. Not going to be a fun time. Not at all...
And of course we're out of support with Netapp. Sigh...
And who knows if the pair that came down won't lose some disks and end up losing one or more aggregates as well. Stressful times for sure.
So I'm just venting here, but any suggestions or tricks would be helpful.
And of course I'm not sure if the cluster switches made it down here yet.
Never put your DC on the second floor if there isn't a second freight elevator. Or elevator in general. Sigh...
John
John,
If it were me, I would take it one step at a time and first get the systems physically in place, cabled up and clusters switches up and running and once you can login on console or SSH then start looking at the aggregates, LIF's, cluster epsilon/quorum and which volumes are source, etc. Keep me posted if you hit any major roadblocks and I can try to help out.
Thank you, Tim
________________________________ From: Toasters toasters-bounces@teaparty.net on behalf of John Stoffel john@stoffel.org Sent: Saturday, May 15, 2021 6:47 PM To: Toasters@teaparty.net Toasters@teaparty.net Subject: 4 node cluster - only two nodes coming up
Hi all,
We're in the middle of hell, where our 4-node FAS8060 cluster was shutdown cleanly for a move, but only one pair made it onto the truck to the new DC. Luckily I have all the volumes snapmirrored between the two pairs of nodes and their aggregates.
But now I need to bring up the pair that made the trip, figure out which mirrors are source and which are destination on this pair, and then break the destination ones so I can promote them to read-write.
This is not something I've practiced, and I wonder that if I have volume foo, mounted on /foo, and it's snapmirror is volume foo_sm, when I do the break, will it automatically mount to /foo? I guess I'll find out later tonight, and I can just unmount and remount.
I think this is all good with just a simple 'snapmirror break ...' but then when we get the chance to rejoin the other two nodes into the cluster down the line, I would asusme I just have to (maybe) wipe the old nodes and rejoin them one at a time. Mostly because by that point I can't have the original source volumes come up and cause us to lose all the writes that have happened on the now writeable destination volumes.
And of course there's the matter of getting epsilon back up and working on the two node cluster when I reboot it. Along with all the LIFs, etc. Not going to be a fun time. Not at all...
And of course we're out of support with Netapp. Sigh...
And who knows if the pair that came down won't lose some disks and end up losing one or more aggregates as well. Stressful times for sure.
So I'm just venting here, but any suggestions or tricks would be helpful.
And of course I'm not sure if the cluster switches made it down here yet.
Never put your DC on the second floor if there isn't a second freight elevator. Or elevator in general. Sigh...
John _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Hi Tim, Thanks for the offer to help. We will just have to see how long it will take to reunite our cluster pairs, though I’d be happy to just dump the missing pair and handle the data loss.
Which won’t be much if anything at all.
Sent from my iPhone
On May 15, 2021, at 7:04 PM, Timothy Naple tnaple@berkcom.com wrote:
John,
If it were me, I would take it one step at a time and first get the systems physically in place, cabled up and clusters switches up and running and once you can login on console or SSH then start looking at the aggregates, LIF's, cluster epsilon/quorum and which volumes are source, etc. Keep me posted if you hit any major roadblocks and I can try to help out.
Thank you, Tim
From: Toasters toasters-bounces@teaparty.net on behalf of John Stoffel john@stoffel.org Sent: Saturday, May 15, 2021 6:47 PM To: Toasters@teaparty.net Toasters@teaparty.net Subject: 4 node cluster - only two nodes coming up
Hi all,
We're in the middle of hell, where our 4-node FAS8060 cluster was shutdown cleanly for a move, but only one pair made it onto the truck to the new DC. Luckily I have all the volumes snapmirrored between the two pairs of nodes and their aggregates.
But now I need to bring up the pair that made the trip, figure out which mirrors are source and which are destination on this pair, and then break the destination ones so I can promote them to read-write.
This is not something I've practiced, and I wonder that if I have volume foo, mounted on /foo, and it's snapmirror is volume foo_sm, when I do the break, will it automatically mount to /foo? I guess I'll find out later tonight, and I can just unmount and remount.
I think this is all good with just a simple 'snapmirror break ...' but then when we get the chance to rejoin the other two nodes into the cluster down the line, I would asusme I just have to (maybe) wipe the old nodes and rejoin them one at a time. Mostly because by that point I can't have the original source volumes come up and cause us to lose all the writes that have happened on the now writeable destination volumes.
And of course there's the matter of getting epsilon back up and working on the two node cluster when I reboot it. Along with all the LIFs, etc. Not going to be a fun time. Not at all...
And of course we're out of support with Netapp. Sigh...
And who knows if the pair that came down won't lose some disks and end up losing one or more aggregates as well. Stressful times for sure.
So I'm just venting here, but any suggestions or tricks would be helpful.
And of course I'm not sure if the cluster switches made it down here yet.
Never put your DC on the second floor if there isn't a second freight elevator. Or elevator in general. Sigh...
John _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
You can only bring up two nodes if one of them has epsilon. Otherwise no configuration changes are possible and that includes breaking snap mirror.
If those two nodes do have epsilon, it is just normal procedure to failover LIF. After snap mirror break destinations volumes are not renamed - they remain exactly as they are. Nothing special needs to be done when two remaining nodes arrive - they just join cluster normally.
If these two nodes do not have epsilon, your best bet is to wait for the remaining cluster nodes to arrive. I am not aware of possibility to force epsilon in this case. May be it exists, but you certainly need support case to obtain it and any follow up step.
Скачайте Outlook для iOShttps://aka.ms/o0ukef ________________________________ От: Пользователь Toasters toasters-bounces@teaparty.net от имени пользователя John Stoffel john@stoffel.org Отправлено: воскресенье, мая 16, 2021 04:54 Кому: Toasters@teaparty.net Тема: 4 node cluster - only two nodes coming up
Hi all,
We're in the middle of hell, where our 4-node FAS8060 cluster was shutdown cleanly for a move, but only one pair made it onto the truck to the new DC. Luckily I have all the volumes snapmirrored between the two pairs of nodes and their aggregates.
But now I need to bring up the pair that made the trip, figure out which mirrors are source and which are destination on this pair, and then break the destination ones so I can promote them to read-write.
This is not something I've practiced, and I wonder that if I have volume foo, mounted on /foo, and it's snapmirror is volume foo_sm, when I do the break, will it automatically mount to /foo? I guess I'll find out later tonight, and I can just unmount and remount.
I think this is all good with just a simple 'snapmirror break ...' but then when we get the chance to rejoin the other two nodes into the cluster down the line, I would asusme I just have to (maybe) wipe the old nodes and rejoin them one at a time. Mostly because by that point I can't have the original source volumes come up and cause us to lose all the writes that have happened on the now writeable destination volumes.
And of course there's the matter of getting epsilon back up and working on the two node cluster when I reboot it. Along with all the LIFs, etc. Not going to be a fun time. Not at all...
And of course we're out of support with Netapp. Sigh...
And who knows if the pair that came down won't lose some disks and end up losing one or more aggregates as well. Stressful times for sure.
So I'm just venting here, but any suggestions or tricks would be helpful.
And of course I'm not sure if the cluster switches made it down here yet.
Never put your DC on the second floor if there isn't a second freight elevator. Or elevator in general. Sigh...
John _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
The two nodes came up ok, the problem is the svms with a root ok on the missing pair of nodes. Escalating with netapp.
Not sure we have the time to wait for elevator repair to bring the missing pair to the new location, ten hours by truck. Been a hellosh weekend.
I wonder if pod sharing rootvols would have saved us some pain? Maybe not since updates wouldn’t be possible.
Sent from my iPhone
On May 16, 2021, at 2:18 AM, andrei.borzenkov@fujitsu.com wrote:
You can only bring up two nodes if one of them has epsilon. Otherwise no configuration changes are possible and that includes breaking snap mirror.
If those two nodes do have epsilon, it is just normal procedure to failover LIF. After snap mirror break destinations volumes are not renamed - they remain exactly as they are. Nothing special needs to be done when two remaining nodes arrive - they just join cluster normally.
If these two nodes do not have epsilon, your best bet is to wait for the remaining cluster nodes to arrive. I am not aware of possibility to force epsilon in this case. May be it exists, but you certainly need support case to obtain it and any follow up step.
Скачайте Outlook для iOS От: Пользователь Toasters toasters-bounces@teaparty.net от имени пользователя John Stoffel john@stoffel.org Отправлено: воскресенье, мая 16, 2021 04:54 Кому: Toasters@teaparty.net Тема: 4 node cluster - only two nodes coming up
Hi all,
We're in the middle of hell, where our 4-node FAS8060 cluster was shutdown cleanly for a move, but only one pair made it onto the truck to the new DC. Luckily I have all the volumes snapmirrored between the two pairs of nodes and their aggregates.
But now I need to bring up the pair that made the trip, figure out which mirrors are source and which are destination on this pair, and then break the destination ones so I can promote them to read-write.
This is not something I've practiced, and I wonder that if I have volume foo, mounted on /foo, and it's snapmirror is volume foo_sm, when I do the break, will it automatically mount to /foo? I guess I'll find out later tonight, and I can just unmount and remount.
I think this is all good with just a simple 'snapmirror break ...' but then when we get the chance to rejoin the other two nodes into the cluster down the line, I would asusme I just have to (maybe) wipe the old nodes and rejoin them one at a time. Mostly because by that point I can't have the original source volumes come up and cause us to lose all the writes that have happened on the now writeable destination volumes.
And of course there's the matter of getting epsilon back up and working on the two node cluster when I reboot it. Along with all the LIFs, etc. Not going to be a fun time. Not at all...
And of course we're out of support with Netapp. Sigh...
And who knows if the pair that came down won't lose some disks and end up losing one or more aggregates as well. Stressful times for sure.
So I'm just venting here, but any suggestions or tricks would be helpful.
And of course I'm not sure if the cluster switches made it down here yet.
Never put your DC on the second floor if there isn't a second freight elevator. Or elevator in general. Sigh...
John _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Well, this is the 1 case, where Load-Sharing Mirrors would have been helpful...
Too late now, though.
Sebastian
On 17.05.2021 02:01, John Stoffel wrote:
The two nodes came up ok, the problem is the svms with a root ok on the missing pair of nodes. Escalating with netapp.
Not sure we have the time to wait for elevator repair to bring the missing pair to the new location, ten hours by truck. Been a hellosh weekend.
I wonder if pod sharing rootvols would have saved us some pain? Maybe not since updates wouldn’t be possible.
Sent from my iPhone
On May 16, 2021, at 2:18 AM, andrei.borzenkov@fujitsu.com wrote:
You can only bring up two nodes if one of them has epsilon. Otherwise no configuration changes are possible and that includes breaking snap mirror.
If those two nodes do have epsilon, it is just normal procedure to failover LIF. After snap mirror break destinations volumes are not renamed - they remain exactly as they are. Nothing special needs to be done when two remaining nodes arrive - they just join cluster normally.
If these two nodes do not have epsilon, your best bet is to wait for the remaining cluster nodes to arrive. I am not aware of possibility to force epsilon in this case. May be it exists, but you certainly need support case to obtain it and any follow up step.
Скачайте Outlook для iOS https://aka.ms/o0ukef
*От:* Пользователь Toasters toasters-bounces@teaparty.net от имени пользователя John Stoffel john@stoffel.org *Отправлено:* воскресенье, мая 16, 2021 04:54 *Кому:* Toasters@teaparty.net *Тема:* 4 node cluster - only two nodes coming up
Hi all,
We're in the middle of hell, where our 4-node FAS8060 cluster was shutdown cleanly for a move, but only one pair made it onto the truck to the new DC. Luckily I have all the volumes snapmirrored between the two pairs of nodes and their aggregates.
But now I need to bring up the pair that made the trip, figure out which mirrors are source and which are destination on this pair, and then break the destination ones so I can promote them to read-write.
This is not something I've practiced, and I wonder that if I have volume foo, mounted on /foo, and it's snapmirror is volume foo_sm, when I do the break, will it automatically mount to /foo? I guess I'll find out later tonight, and I can just unmount and remount.
I think this is all good with just a simple 'snapmirror break ...' but then when we get the chance to rejoin the other two nodes into the cluster down the line, I would asusme I just have to (maybe) wipe the old nodes and rejoin them one at a time. Mostly because by that point I can't have the original source volumes come up and cause us to lose all the writes that have happened on the now writeable destination volumes.
And of course there's the matter of getting epsilon back up and working on the two node cluster when I reboot it. Along with all the LIFs, etc. Not going to be a fun time. Not at all...
And of course we're out of support with Netapp. Sigh...
And who knows if the pair that came down won't lose some disks and end up losing one or more aggregates as well. Stressful times for sure.
So I'm just venting here, but any suggestions or tricks would be helpful.
And of course I'm not sure if the cluster switches made it down here yet.
Never put your DC on the second floor if there isn't a second freight elevator. Or elevator in general. Sigh...
John _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters https://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Sebastian> Well, this is the 1 case, where Load-Sharing Mirrors would Sebastian> have been helpful... Too late now, though.
Maybe... it's not clear if it would have helped all that much. What we should have done in hindsight, since we had the capacity, would be to split off two nodes of the cluster ahead of the move, then snapmirror between them.
It still would have been a pain, but recovery would have been simpler. We're still working on issues, but most things are working.
John
Sebastian> Sebastian
Sebastian> On 17.05.2021 02:01, John Stoffel wrote:
Sebastian> The two nodes came up ok, the problem is the svms with a root ok on the missing pair of nodes. Escalating with netapp.
Sebastian> Not sure we have the time to wait for elevator repair to bring the missing pair to the new location, ten hours by truck. Been a hellosh Sebastian> weekend.
Sebastian> I wonder if pod sharing rootvols would have saved us some pain? Maybe not since updates wouldn’t be possible.
Sebastian> Sent from my iPhone
Sebastian> On May 16, 2021, at 2:18 AM, andrei.borzenkov@fujitsu.com wrote:
Sebastian> You can only bring up two nodes if one of them has epsilon. Otherwise no configuration changes are possible and that includes Sebastian> breaking snap mirror.
Sebastian> If those two nodes do have epsilon, it is just normal procedure to failover LIF. After snap mirror break destinations volumes are not Sebastian> renamed - they remain exactly as they are. Nothing special needs to be done when two remaining nodes arrive - they just join cluster Sebastian> normally.
Sebastian> If these two nodes do not have epsilon, your best bet is to wait for the remaining cluster nodes to arrive. I am not aware of Sebastian> possibility to force epsilon in this case. May be it exists, but you certainly need support case to obtain it and any follow up step.
Sebastian> Скачайте Outlook для iOS Sebastian> ------------------------------------------------------------------------------------------------------------------------------------- Sebastian> От: Пользователь Toasters toasters-bounces@teaparty.net от имени пользователя John Stoffel john@stoffel.org Sebastian> Отправлено: воскресенье, мая 16, 2021 04:54 Sebastian> Кому: Toasters@teaparty.net Sebastian> Тема: 4 node cluster - only two nodes coming up Sebastian>
Sebastian> Hi all,
Sebastian> We're in the middle of hell, where our 4-node FAS8060 cluster was Sebastian> shutdown cleanly for a move, but only one pair made it onto the truck Sebastian> to the new DC. Luckily I have all the volumes snapmirrored between Sebastian> the two pairs of nodes and their aggregates.
Sebastian> But now I need to bring up the pair that made the trip, figure out Sebastian> which mirrors are source and which are destination on this pair, and Sebastian> then break the destination ones so I can promote them to read-write.
Sebastian> This is not something I've practiced, and I wonder that if I have Sebastian> volume foo, mounted on /foo, and it's snapmirror is volume foo_sm, Sebastian> when I do the break, will it automatically mount to /foo? I guess Sebastian> I'll find out later tonight, and I can just unmount and remount.
Sebastian> I think this is all good with just a simple 'snapmirror break ...' but Sebastian> then when we get the chance to rejoin the other two nodes into the Sebastian> cluster down the line, I would asusme I just have to (maybe) wipe the Sebastian> old nodes and rejoin them one at a time. Mostly because by that point Sebastian> I can't have the original source volumes come up and cause us to lose Sebastian> all the writes that have happened on the now writeable destination Sebastian> volumes.
Sebastian> And of course there's the matter of getting epsilon back up and Sebastian> working on the two node cluster when I reboot it. Along with all the Sebastian> LIFs, etc. Not going to be a fun time. Not at all...
Sebastian> And of course we're out of support with Netapp. Sigh...
Sebastian> And who knows if the pair that came down won't lose some disks and end Sebastian> up losing one or more aggregates as well. Stressful times for sure.
Sebastian> So I'm just venting here, but any suggestions or tricks would be Sebastian> helpful.
Sebastian> And of course I'm not sure if the cluster switches made it down here Sebastian> yet.
Sebastian> Never put your DC on the second floor if there isn't a second freight Sebastian> elevator. Or elevator in general. Sigh...
Sebastian> John Sebastian> _______________________________________________ Sebastian> Toasters mailing list Sebastian> Toasters@teaparty.net Sebastian> https://www.teaparty.net/mailman/listinfo/toasters
Sebastian> _______________________________________________ Sebastian> Toasters mailing list Sebastian> Toasters@teaparty.net Sebastian> https://www.teaparty.net/mailman/listinfo/toasters
I had something like this a while back.
I ended up just creating a new 1GB volume and the converting it to the root vol (volume make-vsroot)
--tmac
*Tim McCarthy, **Principal Consultant*
*Proud Member of the #NetAppATeam https://twitter.com/NetAppATeam*
*I Blog at TMACsRack https://tmacsrack.wordpress.com/*
On Sun, May 16, 2021 at 8:05 PM John Stoffel john@stoffel.org wrote:
The two nodes came up ok, the problem is the svms with a root ok on the missing pair of nodes. Escalating with netapp.
Not sure we have the time to wait for elevator repair to bring the missing pair to the new location, ten hours by truck. Been a hellosh weekend.
I wonder if pod sharing rootvols would have saved us some pain? Maybe not since updates wouldn’t be possible.
Sent from my iPhone
On May 16, 2021, at 2:18 AM, andrei.borzenkov@fujitsu.com wrote:
You can only bring up two nodes if one of them has epsilon. Otherwise no configuration changes are possible and that includes breaking snap mirror.
If those two nodes do have epsilon, it is just normal procedure to failover LIF. After snap mirror break destinations volumes are not renamed
- they remain exactly as they are. Nothing special needs to be done when
two remaining nodes arrive - they just join cluster normally.
If these two nodes do not have epsilon, your best bet is to wait for the remaining cluster nodes to arrive. I am not aware of possibility to force epsilon in this case. May be it exists, but you certainly need support case to obtain it and any follow up step.
Скачайте Outlook для iOS https://aka.ms/o0ukef
*От:* Пользователь Toasters toasters-bounces@teaparty.net от имени пользователя John Stoffel john@stoffel.org *Отправлено:* воскресенье, мая 16, 2021 04:54 *Кому:* Toasters@teaparty.net *Тема:* 4 node cluster - only two nodes coming up
Hi all,
We're in the middle of hell, where our 4-node FAS8060 cluster was shutdown cleanly for a move, but only one pair made it onto the truck to the new DC. Luckily I have all the volumes snapmirrored between the two pairs of nodes and their aggregates.
But now I need to bring up the pair that made the trip, figure out which mirrors are source and which are destination on this pair, and then break the destination ones so I can promote them to read-write.
This is not something I've practiced, and I wonder that if I have volume foo, mounted on /foo, and it's snapmirror is volume foo_sm, when I do the break, will it automatically mount to /foo? I guess I'll find out later tonight, and I can just unmount and remount.
I think this is all good with just a simple 'snapmirror break ...' but then when we get the chance to rejoin the other two nodes into the cluster down the line, I would asusme I just have to (maybe) wipe the old nodes and rejoin them one at a time. Mostly because by that point I can't have the original source volumes come up and cause us to lose all the writes that have happened on the now writeable destination volumes.
And of course there's the matter of getting epsilon back up and working on the two node cluster when I reboot it. Along with all the LIFs, etc. Not going to be a fun time. Not at all...
And of course we're out of support with Netapp. Sigh...
And who knows if the pair that came down won't lose some disks and end up losing one or more aggregates as well. Stressful times for sure.
So I'm just venting here, but any suggestions or tricks would be helpful.
And of course I'm not sure if the cluster switches made it down here yet.
Never put your DC on the second floor if there isn't a second freight elevator. Or elevator in general. Sigh...
John _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters
"tmac" == tmac tmacmd@gmail.com writes:
tmac> I had something like this a while back. tmac> I ended up just creating a new 1GB volume and the converting it to the root vol (volume make-vsroot)
Since we have a snapmirror of the old rootvol, I was able to promote it once I had broken the copy. So we're ok now, though still fighting other issues.
John
So the NetApp came up without any major problems, and I can see all the volumes on the two nodes that arrived, but now I need to break the snap mirror relationship (done) and I mount the original volume from the junction path, and mount the mirror onto that new path. Not working well, since the two nodes holding that volume are offline.
I suspect I will have to just delete those nodes from the cluster, which is scary. And accept that when those nodes come back down I will have to initialize them and re-add them into the cluster.
John
Sent from my iPhone
On May 15, 2021, at 6:52 PM, John Stoffel john@stoffel.org wrote:
Hi all,
We're in the middle of hell, where our 4-node FAS8060 cluster was shutdown cleanly for a move, but only one pair made it onto the truck to the new DC. Luckily I have all the volumes snapmirrored between the two pairs of nodes and their aggregates.
But now I need to bring up the pair that made the trip, figure out which mirrors are source and which are destination on this pair, and then break the destination ones so I can promote them to read-write.
This is not something I've practiced, and I wonder that if I have volume foo, mounted on /foo, and it's snapmirror is volume foo_sm, when I do the break, will it automatically mount to /foo? I guess I'll find out later tonight, and I can just unmount and remount.
I think this is all good with just a simple 'snapmirror break ...' but then when we get the chance to rejoin the other two nodes into the cluster down the line, I would asusme I just have to (maybe) wipe the old nodes and rejoin them one at a time. Mostly because by that point I can't have the original source volumes come up and cause us to lose all the writes that have happened on the now writeable destination volumes.
And of course there's the matter of getting epsilon back up and working on the two node cluster when I reboot it. Along with all the LIFs, etc. Not going to be a fun time. Not at all...
And of course we're out of support with Netapp. Sigh...
And who knows if the pair that came down won't lose some disks and end up losing one or more aggregates as well. Stressful times for sure.
So I'm just venting here, but any suggestions or tricks would be helpful.
And of course I'm not sure if the cluster switches made it down here yet.
Never put your DC on the second floor if there isn't a second freight elevator. Or elevator in general. Sigh...
John _______________________________________________ Toasters mailing list Toasters@teaparty.net https://www.teaparty.net/mailman/listinfo/toasters