Greetings,
Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running.
All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up.
Any ideas on the silent failures?
Thanks,
Jeff
Hmm, is it a matter of using ssh interactively, vs non-interactively? Ie:
Ssh toasterA ndmpcopy ....
Vs
Ssh toasterA
ndmpcopy ......
Also, if you have not, look at Options : autologout.console.timeout autologout.telnet.timeout ssh.idle.timeout
and see if you are hitting one of them.
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Cleverley Sent: Wednesday, October 30, 2013 8:23 PM To: Toasters@teaparty.net Subject: ndmpcopy and vol copy via ssh?
Greetings, Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running. All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up. Any ideas on the silent failures? Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611
You may wish to explore this ssh option:
ServerAliveInterval
You can set it in you .ssh config file. Most set the value of 60 to have an Alive message sent every 60 seconds.
We ended up having to do this across our wan to keep clients connected overnight.
--tmac
*Tim McCarthy* *Principal Consultant*
Clustered ONTAP Clustered ONTAP NCDA ID: XK7R3GEKC1QQ2LVD RHCE6 110-107-141https://www.redhat.com/wapps/training/certification/verify.html?certNumber=110-107-141&isSearch=False&verify=Verify NCSIE ID: C14QPHE21FR4YWD4 Expires: 08 November 2014 Current until Aug 02, 2016 Expires: 08 November 2014
On Wed, Oct 30, 2013 at 8:41 PM, Jordan Slingerland < Jordan.Slingerland@independenthealth.com> wrote:
Hmm, is it a matter of using ssh interactively, vs non-interactively?****
Ie:****
Ssh toasterA ndmpcopy ….****
Vs ****
Ssh toasterA****
ndmpcopy ……****
Also, if you have not, look at ****
Options :****
autologout.console.timeout ****
autologout.telnet.timeout ****
ssh.idle.timeout****
and see if you are hitting one of them. ****
*From:* toasters-bounces@teaparty.net [mailto: toasters-bounces@teaparty.net] *On Behalf Of *Jeff Cleverley *Sent:* Wednesday, October 30, 2013 8:23 PM *To:* Toasters@teaparty.net *Subject:* ndmpcopy and vol copy via ssh?****
Greetings,****
Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running.****
All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.****
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up.****
Any ideas on the silent failures?****
Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611****
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
I thought I should add that I use NDMP copy almost every day and only via ssh from an admin host as you describe. Typically I do it interactively, though mainly just because it is easier to deal with special characters and spaces and not ending up in nested single quote, double quote, escape character hell.
One last though, I certainly wouldn't jump to this conclusion, but, I once had an issue with ssh sessions being timed out irregularly when "idle" and it ended up being a firewall had hit the max amount of arp entries (with some help from a rogue device doing very wide ping sweeps) and was apparently killing of the connections that it deemed the most idle .
--JMS
From: tmac [mailto:tmacmd@gmail.com] Sent: Wednesday, October 30, 2013 8:47 PM To: Jordan Slingerland Cc: Jeff Cleverley; Toasters@teaparty.net Subject: Re: ndmpcopy and vol copy via ssh?
You may wish to explore this ssh option:
ServerAliveInterval
You can set it in you .ssh config file. Most set the value of 60 to have an Alive message sent every 60 seconds.
We ended up having to do this across our wan to keep clients connected overnight.
--tmac
Tim McCarthy Principal Consultant
[http://dl.dropbox.com/u/6874230/na_cert_dma_2c.jpg] [http://dl.dropbox.com/u/6874230/rhce.jpeg] [http://dl.dropbox.com/u/6874230/na_cert_ie-san_2c.jpg]
Clustered ONTAP Clustered ONTAP NCDA ID: XK7R3GEKC1QQ2LVD RHCE6 110-107-141https://www.redhat.com/wapps/training/certification/verify.html?certNumber=110-107-141&isSearch=False&verify=Verify NCSIE ID: C14QPHE21FR4YWD4 Expires: 08 November 2014 Current until Aug 02, 2016 Expires: 08 November 2014
On Wed, Oct 30, 2013 at 8:41 PM, Jordan Slingerland <Jordan.Slingerland@independenthealth.commailto:Jordan.Slingerland@independenthealth.com> wrote: Hmm, is it a matter of using ssh interactively, vs non-interactively? Ie:
Ssh toasterA ndmpcopy ....
Vs
Ssh toasterA
ndmpcopy ......
Also, if you have not, look at Options : autologout.console.timeout autologout.telnet.timeout ssh.idle.timeout
and see if you are hitting one of them.
From: toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Cleverley Sent: Wednesday, October 30, 2013 8:23 PM To: <Toasters@teaparty.netmailto:Toasters@teaparty.net> Subject: ndmpcopy and vol copy via ssh?
Greetings, Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running. All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up. Any ideas on the silent failures? Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611tel:970-288-4611
_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
The ssh timeout is set to 600 and all the others were set to 60. We don't use anything other than ssh into them anyway. The timeouts of the session behaviors. Both systems are on the same subnet and we haven't had any problems with timeouts or latency with them.
For the vol copy, this is the end of the data from the ssh command:
VOLCOPY: Starting on volume 1. Volume 'restore_test' is now restricted. 18:03:32 MDT : vol copy restore 0 : begun, 11583164 MB to be copied. [root@cutthroat etc]# date Wed Oct 30 18:13:36 MDT 2013
You can see it ran for ~10 minutes, exited with no errors, and nothing in the logs. When I run it again, it behaves the same, but I noticed the size to be copied gets a little lower each time. A few thousand more of these and it will be done :-)
And for Peter, we don't have any Windows we support in our lab, so we are pretty PowerShell adverse.
I'll open a call with NetApp and figure out how to trace things better. I am going to try an interactive shell with both just to see how they work out before I do that though.
Thanks,
Jeff
On Wed, Oct 30, 2013 at 7:06 PM, Jordan Slingerland < Jordan.Slingerland@independenthealth.com> wrote:
I thought I should add that I use NDMP copy almost every day and only via ssh from an admin host as you describe. Typically I do it interactively, though mainly just because it is easier to deal with special characters and spaces and not ending up in nested single quote, double quote, escape character hell.****
One last though, I certainly wouldn’t jump to this conclusion, but, I once had an issue with ssh sessions being timed out irregularly when “idle” and it ended up being a firewall had hit the max amount of arp entries (with some help from a rogue device doing very wide ping sweeps) and was apparently killing of the connections that it deemed the most idle .****
--JMS****
*From:* tmac [mailto:tmacmd@gmail.com] *Sent:* Wednesday, October 30, 2013 8:47 PM *To:* Jordan Slingerland *Cc:* Jeff Cleverley; Toasters@teaparty.net *Subject:* Re: ndmpcopy and vol copy via ssh?****
You may wish to explore this ssh option:****
ServerAliveInterval****
You can set it in you .ssh config file. Most set the value of 60 to have an Alive message sent every 60 seconds.****
We ended up having to do this across our wan to keep clients connected overnight.****
--tmac****
*Tim McCarthy*****
*Principal Consultant*****
****
Clustered ONTAP Clustered ONTAP****
NCDA ID: XK7R3GEKC1QQ2LVD RHCE6 110-107-141https://www.redhat.com/wapps/training/certification/verify.html?certNumber=110-107-141&isSearch=False&verify=Verify NCSIE ID: C14QPHE21FR4YWD4****
Expires: 08 November 2014 Current until Aug 02, 2016 Expires: 08 November 2014****
On Wed, Oct 30, 2013 at 8:41 PM, Jordan Slingerland < Jordan.Slingerland@independenthealth.com> wrote:****
Hmm, is it a matter of using ssh interactively, vs non-interactively?****
Ie:****
Ssh toasterA ndmpcopy ….****
Vs ****
Ssh toasterA****
ndmpcopy ……****
Also, if you have not, look at ****
Options :****
autologout.console.timeout ****
autologout.telnet.timeout ****
ssh.idle.timeout****
and see if you are hitting one of them. ****
*From:* toasters-bounces@teaparty.net [mailto: toasters-bounces@teaparty.net] *On Behalf Of *Jeff Cleverley *Sent:* Wednesday, October 30, 2013 8:23 PM *To:* Toasters@teaparty.net *Subject:* ndmpcopy and vol copy via ssh?****
Greetings,****
Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running.****
All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.****
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up.****
Any ideas on the silent failures?****
Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611****
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters****
I seem to remember from NCDA boot camp that autologout.telnet.timeout actually impacts ssh and telnet. There is also a autologout.console.timeout. Also, I think ssh timeout is in seconds so that would be 10m.
--JMS
From: Jeff Cleverley [mailto:jeff.cleverley@avagotech.com] Sent: Thursday, October 31, 2013 1:02 PM To: Jordan Slingerland Cc: tmac; Toasters@teaparty.net Subject: Re: ndmpcopy and vol copy via ssh?
The ssh timeout is set to 600 and all the others were set to 60. We don't use anything other than ssh into them anyway. The timeouts of the session behaviors. Both systems are on the same subnet and we haven't had any problems with timeouts or latency with them. For the vol copy, this is the end of the data from the ssh command:
VOLCOPY: Starting on volume 1. Volume 'restore_test' is now restricted. 18:03:32 MDT : vol copy restore 0 : begun, 11583164 MB to be copied. [root@cutthroat etc]# date Wed Oct 30 18:13:36 MDT 2013 You can see it ran for ~10 minutes, exited with no errors, and nothing in the logs. When I run it again, it behaves the same, but I noticed the size to be copied gets a little lower each time. A few thousand more of these and it will be done :-) And for Peter, we don't have any Windows we support in our lab, so we are pretty PowerShell adverse. I'll open a call with NetApp and figure out how to trace things better. I am going to try an interactive shell with both just to see how they work out before I do that though. Thanks,
Jeff
On Wed, Oct 30, 2013 at 7:06 PM, Jordan Slingerland <Jordan.Slingerland@independenthealth.commailto:Jordan.Slingerland@independenthealth.com> wrote: I thought I should add that I use NDMP copy almost every day and only via ssh from an admin host as you describe. Typically I do it interactively, though mainly just because it is easier to deal with special characters and spaces and not ending up in nested single quote, double quote, escape character hell.
One last though, I certainly wouldn't jump to this conclusion, but, I once had an issue with ssh sessions being timed out irregularly when "idle" and it ended up being a firewall had hit the max amount of arp entries (with some help from a rogue device doing very wide ping sweeps) and was apparently killing of the connections that it deemed the most idle .
--JMS
From: tmac [mailto:tmacmd@gmail.commailto:tmacmd@gmail.com] Sent: Wednesday, October 30, 2013 8:47 PM To: Jordan Slingerland Cc: Jeff Cleverley; <Toasters@teaparty.netmailto:Toasters@teaparty.net> Subject: Re: ndmpcopy and vol copy via ssh?
You may wish to explore this ssh option:
ServerAliveInterval
You can set it in you .ssh config file. Most set the value of 60 to have an Alive message sent every 60 seconds.
We ended up having to do this across our wan to keep clients connected overnight.
--tmac
Tim McCarthy Principal Consultant
[http://dl.dropbox.com/u/6874230/na_cert_dma_2c.jpg] [http://dl.dropbox.com/u/6874230/rhce.jpeg] [http://dl.dropbox.com/u/6874230/na_cert_ie-san_2c.jpg]
Clustered ONTAP Clustered ONTAP NCDA ID: XK7R3GEKC1QQ2LVD RHCE6 110-107-141https://www.redhat.com/wapps/training/certification/verify.html?certNumber=110-107-141&isSearch=False&verify=Verify NCSIE ID: C14QPHE21FR4YWD4 Expires: 08 November 2014 Current until Aug 02, 2016 Expires: 08 November 2014
On Wed, Oct 30, 2013 at 8:41 PM, Jordan Slingerland <Jordan.Slingerland@independenthealth.commailto:Jordan.Slingerland@independenthealth.com> wrote: Hmm, is it a matter of using ssh interactively, vs non-interactively? Ie:
Ssh toasterA ndmpcopy ....
Vs
Ssh toasterA
ndmpcopy ......
Also, if you have not, look at Options : autologout.console.timeout autologout.telnet.timeout ssh.idle.timeout
and see if you are hitting one of them.
From: toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Cleverley Sent: Wednesday, October 30, 2013 8:23 PM To: <Toasters@teaparty.netmailto:Toasters@teaparty.net> Subject: ndmpcopy and vol copy via ssh?
Greetings, Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running. All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up. Any ideas on the silent failures? Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611tel:970-288-4611
_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611
Jordan,
Thanks for the seconds/minutes reminder. It looks like that worked. I set the ssh timeout to 2 days and the vol copy has continued to run beyond 10 minutes now.. I may have to modify that anyway. This line showed up indicating 49 hours:
12:53:11 MDT : vol copy restore 0 : 1 % done. Estimate 2982 minutes remaining.
I revisited the backup log and was able to see each loop of the ndmpcopy command ran for 10 minutes, then quit, then the next one started. That accounted for the 80 minutes with the multiple directories. It also explains the smaller sections completing and the larger ones failing.
Thanks,
Jeff
On Thu, Oct 31, 2013 at 12:13 PM, Jordan Slingerland < Jordan.Slingerland@independenthealth.com> wrote:
I seem to remember from NCDA boot camp that autologout.telnet.timeout actually impacts ssh and telnet. There is also a autologout.console.timeout. Also, I think ssh timeout is in seconds so that would be 10m.****
--JMS****
*From:* Jeff Cleverley [mailto:jeff.cleverley@avagotech.com] *Sent:* Thursday, October 31, 2013 1:02 PM *To:* Jordan Slingerland *Cc:* tmac; Toasters@teaparty.net
*Subject:* Re: ndmpcopy and vol copy via ssh?****
The ssh timeout is set to 600 and all the others were set to 60. We don't use anything other than ssh into them anyway. The timeouts of the session behaviors. Both systems are on the same subnet and we haven't had any problems with timeouts or latency with them.****
For the vol copy, this is the end of the data from the ssh command:
VOLCOPY: Starting on volume 1. Volume 'restore_test' is now restricted. 18:03:32 MDT : vol copy restore 0 : begun, 11583164 MB to be copied. [root@cutthroat etc]# date Wed Oct 30 18:13:36 MDT 2013****
You can see it ran for ~10 minutes, exited with no errors, and nothing in the logs. When I run it again, it behaves the same, but I noticed the size to be copied gets a little lower each time. A few thousand more of these and it will be done :-)****
And for Peter, we don't have any Windows we support in our lab, so we are pretty PowerShell adverse.****
I'll open a call with NetApp and figure out how to trace things better. I am going to try an interactive shell with both just to see how they work out before I do that though.****
Thanks,
Jeff****
On Wed, Oct 30, 2013 at 7:06 PM, Jordan Slingerland < Jordan.Slingerland@independenthealth.com> wrote:****
I thought I should add that I use NDMP copy almost every day and only via ssh from an admin host as you describe. Typically I do it interactively, though mainly just because it is easier to deal with special characters and spaces and not ending up in nested single quote, double quote, escape character hell.****
One last though, I certainly wouldn’t jump to this conclusion, but, I once had an issue with ssh sessions being timed out irregularly when “idle” and it ended up being a firewall had hit the max amount of arp entries (with some help from a rogue device doing very wide ping sweeps) and was apparently killing of the connections that it deemed the most idle .****
--JMS****
*From:* tmac [mailto:tmacmd@gmail.com] *Sent:* Wednesday, October 30, 2013 8:47 PM *To:* Jordan Slingerland *Cc:* Jeff Cleverley; Toasters@teaparty.net *Subject:* Re: ndmpcopy and vol copy via ssh?****
You may wish to explore this ssh option:****
ServerAliveInterval****
You can set it in you .ssh config file. Most set the value of 60 to have an Alive message sent every 60 seconds.****
We ended up having to do this across our wan to keep clients connected overnight.****
--tmac****
*Tim McCarthy*****
*Principal Consultant*****
****
Clustered ONTAP Clustered ONTAP****
NCDA ID: XK7R3GEKC1QQ2LVD RHCE6 110-107-141https://www.redhat.com/wapps/training/certification/verify.html?certNumber=110-107-141&isSearch=False&verify=Verify NCSIE ID: C14QPHE21FR4YWD4****
Expires: 08 November 2014 Current until Aug 02, 2016 Expires: 08 November 2014****
On Wed, Oct 30, 2013 at 8:41 PM, Jordan Slingerland < Jordan.Slingerland@independenthealth.com> wrote:****
Hmm, is it a matter of using ssh interactively, vs non-interactively?****
Ie:****
Ssh toasterA ndmpcopy ….****
Vs ****
Ssh toasterA****
ndmpcopy ……****
Also, if you have not, look at ****
Options :****
autologout.console.timeout ****
autologout.telnet.timeout ****
ssh.idle.timeout****
and see if you are hitting one of them. ****
*From:* toasters-bounces@teaparty.net [mailto: toasters-bounces@teaparty.net] *On Behalf Of *Jeff Cleverley *Sent:* Wednesday, October 30, 2013 8:23 PM *To:* Toasters@teaparty.net *Subject:* ndmpcopy and vol copy via ssh?****
Greetings,****
Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running.****
All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.****
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up.****
Any ideas on the silent failures?****
Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611****
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters****
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611****
Wow, that's a long vol copy. Maybe consider snapmirror too if you have the license. IF the initialization gets gut off, it will restart at a checkpoint.
From: Jeff Cleverley [mailto:jeff.cleverley@avagotech.com] Sent: Thursday, October 31, 2013 3:13 PM To: Jordan Slingerland Cc: tmac; Toasters@teaparty.net Subject: Re: ndmpcopy and vol copy via ssh?
Jordan, Thanks for the seconds/minutes reminder. It looks like that worked. I set the ssh timeout to 2 days and the vol copy has continued to run beyond 10 minutes now.. I may have to modify that anyway. This line showed up indicating 49 hours:
12:53:11 MDT : vol copy restore 0 : 1 % done. Estimate 2982 minutes remaining. I revisited the backup log and was able to see each loop of the ndmpcopy command ran for 10 minutes, then quit, then the next one started. That accounted for the 80 minutes with the multiple directories. It also explains the smaller sections completing and the larger ones failing. Thanks, Jeff
On Thu, Oct 31, 2013 at 12:13 PM, Jordan Slingerland <Jordan.Slingerland@independenthealth.commailto:Jordan.Slingerland@independenthealth.com> wrote: I seem to remember from NCDA boot camp that autologout.telnet.timeout actually impacts ssh and telnet. There is also a autologout.console.timeout. Also, I think ssh timeout is in seconds so that would be 10m.
--JMS
From: Jeff Cleverley [mailto:jeff.cleverley@avagotech.commailto:jeff.cleverley@avagotech.com] Sent: Thursday, October 31, 2013 1:02 PM To: Jordan Slingerland Cc: tmac; <Toasters@teaparty.netmailto:Toasters@teaparty.net>
Subject: Re: ndmpcopy and vol copy via ssh?
The ssh timeout is set to 600 and all the others were set to 60. We don't use anything other than ssh into them anyway. The timeouts of the session behaviors. Both systems are on the same subnet and we haven't had any problems with timeouts or latency with them. For the vol copy, this is the end of the data from the ssh command:
VOLCOPY: Starting on volume 1. Volume 'restore_test' is now restricted. 18:03:32 MDT : vol copy restore 0 : begun, 11583164 MB to be copied. [root@cutthroat etc]# date Wed Oct 30 18:13:36 MDT 2013 You can see it ran for ~10 minutes, exited with no errors, and nothing in the logs. When I run it again, it behaves the same, but I noticed the size to be copied gets a little lower each time. A few thousand more of these and it will be done :-) And for Peter, we don't have any Windows we support in our lab, so we are pretty PowerShell adverse. I'll open a call with NetApp and figure out how to trace things better. I am going to try an interactive shell with both just to see how they work out before I do that though. Thanks,
Jeff
On Wed, Oct 30, 2013 at 7:06 PM, Jordan Slingerland <Jordan.Slingerland@independenthealth.commailto:Jordan.Slingerland@independenthealth.com> wrote: I thought I should add that I use NDMP copy almost every day and only via ssh from an admin host as you describe. Typically I do it interactively, though mainly just because it is easier to deal with special characters and spaces and not ending up in nested single quote, double quote, escape character hell.
One last though, I certainly wouldn't jump to this conclusion, but, I once had an issue with ssh sessions being timed out irregularly when "idle" and it ended up being a firewall had hit the max amount of arp entries (with some help from a rogue device doing very wide ping sweeps) and was apparently killing of the connections that it deemed the most idle .
--JMS
From: tmac [mailto:tmacmd@gmail.commailto:tmacmd@gmail.com] Sent: Wednesday, October 30, 2013 8:47 PM To: Jordan Slingerland Cc: Jeff Cleverley; <Toasters@teaparty.netmailto:Toasters@teaparty.net> Subject: Re: ndmpcopy and vol copy via ssh?
You may wish to explore this ssh option:
ServerAliveInterval
You can set it in you .ssh config file. Most set the value of 60 to have an Alive message sent every 60 seconds.
We ended up having to do this across our wan to keep clients connected overnight.
--tmac
Tim McCarthy Principal Consultant
[http://dl.dropbox.com/u/6874230/na_cert_dma_2c.jpg] [http://dl.dropbox.com/u/6874230/rhce.jpeg] [http://dl.dropbox.com/u/6874230/na_cert_ie-san_2c.jpg]
Clustered ONTAP Clustered ONTAP NCDA ID: XK7R3GEKC1QQ2LVD RHCE6 110-107-141https://www.redhat.com/wapps/training/certification/verify.html?certNumber=110-107-141&isSearch=False&verify=Verify NCSIE ID: C14QPHE21FR4YWD4 Expires: 08 November 2014 Current until Aug 02, 2016 Expires: 08 November 2014
On Wed, Oct 30, 2013 at 8:41 PM, Jordan Slingerland <Jordan.Slingerland@independenthealth.commailto:Jordan.Slingerland@independenthealth.com> wrote: Hmm, is it a matter of using ssh interactively, vs non-interactively? Ie:
Ssh toasterA ndmpcopy ....
Vs
Ssh toasterA
ndmpcopy ......
Also, if you have not, look at Options : autologout.console.timeout autologout.telnet.timeout ssh.idle.timeout
and see if you are hitting one of them.
From: toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Cleverley Sent: Wednesday, October 30, 2013 8:23 PM To: <Toasters@teaparty.netmailto:Toasters@teaparty.net> Subject: ndmpcopy and vol copy via ssh?
Greetings, Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running. All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up. Any ideas on the silent failures? Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611tel:970-288-4611
_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611
Jordan,
It will be a long vol copy. We do have snapmirror licenses, but I've run into a couple of quirks and issues with it. The model is a complete aggregate/volume recovery on a primary filer. We generally don't use qtrees and all the volumes we care about are nfs only.
When we do a new snapvault baseline, we can get close to 900GB/hr. When we've tried to pull it back, we're not getting anywhere near that to a new aggregate.
The problems I hit with snapmirror is it won't work with the source being a replication destination. I have to do a snapvault stop to get it working. I think I tried deleting the SV snapshot but it still didn't like it. It also moves over a qtree which I then have to push all the contents up one level to put them back where they originally were. Even when I did this method, it took SM 46 hours to put the data back, then it took 6 hours to move everything up 1 directory.
I never used it, but I'm going to try making a flex clone and see how things work from there.
I'd like to do the ndmpcopy just because I can run multiple threads and I don't have the qtree issue but I think this will prove to be fairly slow also.
I'm open to what should be the fastest way to restore a 10T file system with 25m inodes to a new aggregate on a different filer. Sneaker-net with 3 ds2246 shelves is not an option :-)
Thanks,
Jeff
On Thu, Oct 31, 2013 at 2:31 PM, Jordan Slingerland < Jordan.Slingerland@independenthealth.com> wrote:
Wow, that’s a long vol copy. Maybe consider snapmirror too if you have the license. IF the initialization gets gut off, it will restart at a checkpoint.****
*From:* Jeff Cleverley [mailto:jeff.cleverley@avagotech.com] *Sent:* Thursday, October 31, 2013 3:13 PM
*To:* Jordan Slingerland *Cc:* tmac; Toasters@teaparty.net *Subject:* Re: ndmpcopy and vol copy via ssh?****
Jordan,****
Thanks for the seconds/minutes reminder. It looks like that worked. I set the ssh timeout to 2 days and the vol copy has continued to run beyond 10 minutes now.. I may have to modify that anyway. This line showed up indicating 49 hours:
12:53:11 MDT : vol copy restore 0 : 1 % done. Estimate 2982 minutes remaining.****
I revisited the backup log and was able to see each loop of the ndmpcopy command ran for 10 minutes, then quit, then the next one started. That accounted for the 80 minutes with the multiple directories. It also explains the smaller sections completing and the larger ones failing.****
Thanks,****
Jeff****
On Thu, Oct 31, 2013 at 12:13 PM, Jordan Slingerland < Jordan.Slingerland@independenthealth.com> wrote:****
I seem to remember from NCDA boot camp that autologout.telnet.timeout actually impacts ssh and telnet. There is also a autologout.console.timeout. Also, I think ssh timeout is in seconds so that would be 10m.****
--JMS****
*From:* Jeff Cleverley [mailto:jeff.cleverley@avagotech.com] *Sent:* Thursday, October 31, 2013 1:02 PM *To:* Jordan Slingerland *Cc:* tmac; Toasters@teaparty.net****
*Subject:* Re: ndmpcopy and vol copy via ssh?****
The ssh timeout is set to 600 and all the others were set to 60. We don't use anything other than ssh into them anyway. The timeouts of the session behaviors. Both systems are on the same subnet and we haven't had any problems with timeouts or latency with them.****
For the vol copy, this is the end of the data from the ssh command:
VOLCOPY: Starting on volume 1. Volume 'restore_test' is now restricted. 18:03:32 MDT : vol copy restore 0 : begun, 11583164 MB to be copied. [root@cutthroat etc]# date Wed Oct 30 18:13:36 MDT 2013****
You can see it ran for ~10 minutes, exited with no errors, and nothing in the logs. When I run it again, it behaves the same, but I noticed the size to be copied gets a little lower each time. A few thousand more of these and it will be done :-)****
And for Peter, we don't have any Windows we support in our lab, so we are pretty PowerShell adverse.****
I'll open a call with NetApp and figure out how to trace things better. I am going to try an interactive shell with both just to see how they work out before I do that though.****
Thanks,
Jeff****
On Wed, Oct 30, 2013 at 7:06 PM, Jordan Slingerland < Jordan.Slingerland@independenthealth.com> wrote:****
I thought I should add that I use NDMP copy almost every day and only via ssh from an admin host as you describe. Typically I do it interactively, though mainly just because it is easier to deal with special characters and spaces and not ending up in nested single quote, double quote, escape character hell.****
One last though, I certainly wouldn’t jump to this conclusion, but, I once had an issue with ssh sessions being timed out irregularly when “idle” and it ended up being a firewall had hit the max amount of arp entries (with some help from a rogue device doing very wide ping sweeps) and was apparently killing of the connections that it deemed the most idle .****
--JMS****
*From:* tmac [mailto:tmacmd@gmail.com] *Sent:* Wednesday, October 30, 2013 8:47 PM *To:* Jordan Slingerland *Cc:* Jeff Cleverley; Toasters@teaparty.net *Subject:* Re: ndmpcopy and vol copy via ssh?****
You may wish to explore this ssh option:****
ServerAliveInterval****
You can set it in you .ssh config file. Most set the value of 60 to have an Alive message sent every 60 seconds.****
We ended up having to do this across our wan to keep clients connected overnight.****
--tmac****
*Tim McCarthy*****
*Principal Consultant*****
****
Clustered ONTAP Clustered ONTAP****
NCDA ID: XK7R3GEKC1QQ2LVD RHCE6 110-107-141https://www.redhat.com/wapps/training/certification/verify.html?certNumber=110-107-141&isSearch=False&verify=Verify NCSIE ID: C14QPHE21FR4YWD4****
Expires: 08 November 2014 Current until Aug 02, 2016 Expires: 08 November 2014****
On Wed, Oct 30, 2013 at 8:41 PM, Jordan Slingerland < Jordan.Slingerland@independenthealth.com> wrote:****
Hmm, is it a matter of using ssh interactively, vs non-interactively?****
Ie:****
Ssh toasterA ndmpcopy ….****
Vs ****
Ssh toasterA****
ndmpcopy ……****
Also, if you have not, look at ****
Options :****
autologout.console.timeout ****
autologout.telnet.timeout ****
ssh.idle.timeout****
and see if you are hitting one of them. ****
*From:* toasters-bounces@teaparty.net [mailto: toasters-bounces@teaparty.net] *On Behalf Of *Jeff Cleverley *Sent:* Wednesday, October 30, 2013 8:23 PM *To:* Toasters@teaparty.net *Subject:* ndmpcopy and vol copy via ssh?****
Greetings,****
Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running.****
All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.****
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up.****
Any ideas on the silent failures?****
Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611****
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters****
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611****
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611****
Hi Jeff,
Very odd behavior that you describe. I have kicked off many ndmp copies and vol copies from an admin host via ssh. They ran just fine pre- and post-ontap 8.x.
On Oct 30, 2013, at 8:22 PM, Jeff Cleverley jeff.cleverley@avagotech.com wrote:
Greetings,
Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running.
All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up.
Any ideas on the silent failures?
Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611 _______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Here's a crazy alternative: PowerShell (Unless you have a Windows/Microsoft allergy) Invoke-NaNdmpCopy and Start-NaNdmpCopy
But I don't see a vol copy cmdlet. You could New-NaVolClone then Start-NaVolMove, but New-NaVolClone requires a FlexClone license.
Just a wild thought.
Peter
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Cleverley Sent: Wednesday, October 30, 2013 5:23 PM To: Toasters@teaparty.net Subject: ndmpcopy and vol copy via ssh?
Greetings, Is it acceptable to use ssh from an admin host to run ndmpcopy and vol copy commands?
It seems to work initially, but both seem to die after varying periods of time with no useful explanations in the log files. I found a KB about someone connecting to the filer and running ctrl-c thinking the console is idle, but that doesn't apply here unless an ssh command from a cron job or something has the same effect. ndmpd status and vol copy status verify that there are no copies running. All filers are 8.1.2P4, source is a 6080, destination is a 6290, 10G networking on both.
I'm doing restore time tests of entire volumes using various methods and these 2 are part of the list. The vol copy ran for ~10 minutes out of the 466 estimated, then stopped. No errors on the command line. It does run, so all the host.equiv entries and permissions are good. I'm sure it did not do the 11TB in 10 minutes :-)
The ndmpcopy command ran for ~80 minutes and silently quit. It restored 217G, almost all are in smaller directories. The larger directories didn't show up and there are no abort messages in the logs. They just quit logging.
For the ndmpcopy I had a script do an ndmpcopy for each directory (9 total). It looked like multiple ones were running at the same time which seems OK. I was hoping to use parallel threads to speed things up. Any ideas on the silent failures? Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611