On Wed, Jun 08, 2011 at 08:24:54AM +0200, Sander Klein wrote:
Maybe it's related to the ssh client you're using?
Pretty sure it's not a network or SSH issue.
I just got a nice reproduction, and the data I got really surprised me. (BTW, the filer I'm hitting below is running 8.0.1P2).
I have a script that checks "qtree stats" periodically. And to save on SSH setup/teardown costs, it runs a sysstat in the same command. But the script then works through the data and hands back a summary. I don't have a copy of the raw stuff coming back. But I saw some oddities that made it appear that it was not getting all the data.
So I ran the following command and was going to put it in a crontab to run mutliple times so I could see if it went off. "qtree stats; sysstat -c 1 5"
Well, while testing the script, on the 4th time I ran it I got "short" output. On this filer, that command should return 179 lines. On the last try it only returned 10, but not simply the first 10. Here's a (slightly obfuscated) version of what I got back:
-----BEGIN No qtrees are in use in Volume vol0 Volume Tree NFS ops CIFS ops -------- -------- ------- -------- perf [user]perf 1773 0 Volume Tree NFS ops CIFS ops -------- -------- ------- -------- test_vm_volume vm_vol 0 0 CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 77% 28350 0 0 290677 82179 194167 415548 0 0 6s -----END
In reality, the "perf" volume has over 100 qtrees. The one shown above is the first line in the list when the command is complete. So it's not simply truncating the total output, it has managed to drop a portion of the output of one command.
If it were a network or SSH problem, I wouldn't expect a gap in the middle of a command, nor would I expect it exactly on a nice line boundary.
Looks like I can open a ticket now.
Of course, f you know the right people...you can get the Ontap API software. I am able to query 8 different filers, collect info on about 200 volumes (volume info, aggr info, qtree info--> which there are 1000's of qtrees) in about 3-4 seconds and no SSH involved at all.
perl library, jar file, .NET, c++...lots of different ways to plug into ONTAP.
I believe it comes in on the http admin route.
--tmac Tim McCarthy Principal Consultant
RedHat Certified Engineer 804006984323821 (RHEL4) 805007643429572 (RHEL5)
On Thu, Jun 9, 2011 at 8:10 PM, A Darren Dunham ddunham@taos.com wrote:
On Wed, Jun 08, 2011 at 08:24:54AM +0200, Sander Klein wrote:
Maybe it's related to the ssh client you're using?
Pretty sure it's not a network or SSH issue.
I just got a nice reproduction, and the data I got really surprised me. (BTW, the filer I'm hitting below is running 8.0.1P2).
I have a script that checks "qtree stats" periodically. And to save on SSH setup/teardown costs, it runs a sysstat in the same command. But the script then works through the data and hands back a summary. I don't have a copy of the raw stuff coming back. But I saw some oddities that made it appear that it was not getting all the data.
So I ran the following command and was going to put it in a crontab to run mutliple times so I could see if it went off. "qtree stats; sysstat -c 1 5"
Well, while testing the script, on the 4th time I ran it I got "short" output. On this filer, that command should return 179 lines. On the last try it only returned 10, but not simply the first 10. Here's a (slightly obfuscated) version of what I got back:
-----BEGIN No qtrees are in use in Volume vol0 Volume Tree NFS ops CIFS ops -------- -------- ------- -------- perf [user]perf 1773 0 Volume Tree NFS ops CIFS ops -------- -------- ------- -------- test_vm_volume vm_vol 0 0 CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 77% 28350 0 0 290677 82179 194167 415548 0 0 6s -----END
In reality, the "perf" volume has over 100 qtrees. The one shown above is the first line in the list when the command is complete. So it's not simply truncating the total output, it has managed to drop a portion of the output of one command.
If it were a network or SSH problem, I wouldn't expect a gap in the middle of a command, nor would I expect it exactly on a nice line boundary.
Looks like I can open a ticket now.
-- Darren
On Thu, Jun 09, 2011 at 08:56:27PM -0400, tmac wrote:
Of course, f you know the right people...you can get the Ontap API software. I am able to query 8 different filers, collect info on about 200 volumes (volume info, aggr info, qtree info--> which there are 1000's of qtrees) in about 3-4 seconds and no SSH involved at all.
perl library, jar file, .NET, c++...lots of different ways to plug into ONTAP
No question. But I still have many items running that are hideously complex and more than 10 years old. I can't convert them in any reasonable time frame.
And all our "by hand" administration is still via SSH, which is where we first noticed this. I was just able to get confirmation from this script that is using SSH as well.
It's very possible that by going around the shell/interpreter, the API transport would not be subject to this problem. I'll have to see if I can get that tested. Even if it's fine, I really need the SSH stuff to work because of the legacy scripts I have.
Hi Guys I ran into this one last year doing some early testing on pre-release 8.0.1. Never occurred to me this is a bug that might be experienced in the field - my bad. It has since been reported as bug 485715 http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=485715. Although the bug report lists no workaround, I found that appending a command that takes a few seconds, like ping <something that isn't there> or "sysstat -c 1 3' gave the command you wanted time to complete output before the SSH connection is closed.
From one of my test scripts:
sshcmd $filer -qfl root 'priv set -q diag ; stats show vstorage ; ping 172.16.26.254'
Down side is that ping can take 15 seconds to time out, and there's no option in 7-mode to specify a timeout. Sysstat gives you control down to the second, but you have to scrub the output. Sorry, no sleep (ONTAP or me these days! ;-)
Please open a case with NetApp support and have them add you/your case as call_rec to burt 485715.
Thanks
Peter
-----Original Message----- From: A Darren Dunham [mailto:ddunham@taos.com] Sent: Friday, June 10, 2011 9:06 AM To: toasters@mathworks.com Subject: Re: Truncated SSH ouptut on 8.x?
On Thu, Jun 09, 2011 at 08:56:27PM -0400, tmac wrote:
Of course, f you know the right people...you can get the Ontap API
software.
I am able to query 8 different filers, collect info on about 200
volumes
(volume info, aggr info, qtree info--> which there are 1000's of
qtrees)
in about 3-4 seconds and no SSH involved at all.
perl library, jar file, .NET, c++...lots of different ways to plug into ONTAP
No question. But I still have many items running that are hideously complex and more than 10 years old. I can't convert them in any reasonable time frame.
And all our "by hand" administration is still via SSH, which is where we first noticed this. I was just able to get confirmation from this script that is using SSH as well.
It's very possible that by going around the shell/interpreter, the API transport would not be subject to this problem. I'll have to see if I can get that tested. Even if it's fine, I really need the SSH stuff to work because of the legacy scripts I have.
On Fri, Jun 10, 2011 at 10:57:00AM -0700, Learmonth, Peter wrote:
Hi Guys I ran into this one last year doing some early testing on pre-release 8.0.1. Never occurred to me this is a bug that might be experienced in the field - my bad. It has since been reported as bug 485715 http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=485715. Although the bug report lists no workaround, I found that appending a command that takes a few seconds, like ping <something that isn't there> or "sysstat -c 1 3' gave the command you wanted time to complete output before the SSH connection is closed.
That's not a workaround for what I'm seeing. I actually get the output of the second command, but the output from the first command is still truncated. It doesn't seem to be SSH related to me.
Here's another one I just ran:
# ssh <filer> options acp.domain 16820416 acp.enabled on ndmpd.enable on #
Note that I didn't get the first three lines, I got the first two lines and one random line from the middle.
Please open a case with NetApp support and have them add you/your case as call_rec to burt 485715.
Done. Thanks for the pointer.
I had the same problem with a set of new filers all running 8.0 code. Searching NOW I found this KB https://kb.netapp.com/support/index?page=content&id=2013198 Indeed on the ONTAP 8.0x systems the default value for ssh.idle.timeout is set to 0 whereas on all 7.x filers it was set to 600. After changing it on the 8.0x filers, things seems to be working OK. Hope this helps
-net
On Fri, Jun 10, 2011 at 11:47 AM, A Darren Dunham ddunham@taos.com wrote:
On Fri, Jun 10, 2011 at 10:57:00AM -0700, Learmonth, Peter wrote:
Hi Guys I ran into this one last year doing some early testing on pre-release 8.0.1. Never occurred to me this is a bug that might be experienced in the field - my bad. It has since been reported as bug 485715 http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=485715. Although the bug report lists no workaround, I found that appending a command that takes a few seconds, like ping <something that isn't there> or "sysstat -c 1 3' gave the command you wanted time to complete output before the SSH connection is closed.
That's not a workaround for what I'm seeing. I actually get the output of the second command, but the output from the first command is still truncated. It doesn't seem to be SSH related to me.
Here's another one I just ran:
# ssh <filer> options acp.domain 16820416 acp.enabled on ndmpd.enable on #
Note that I didn't get the first three lines, I got the first two lines and one random line from the middle.
Please open a case with NetApp support and have them add you/your case as call_rec to burt 485715.
Done. Thanks for the pointer.
-- Darren
On Tue, Jun 14, 2011 at 08:55:40AM -0700, Sto Rage? wrote:
I had the same problem with a set of new filers all running 8.0 code. Searching NOW I found this KB https://kb.netapp.com/support/index?page=content&id=2013198 Indeed on the ONTAP 8.0x systems the default value for ssh.idle.timeout is set to 0 whereas on all 7.x filers it was set to 600. After changing it on the 8.0x filers, things seems to be working OK. Hope this helps
I will agree that a lot of these filers I'm seeing it on had the idle set to "0", but changing it to "600" didn't seem to change anything. I can still reproduce it with that change.
Yeah, I spoke too soon ;( Initially they seemed to work fine but not anymore. This is a serious bug, I mean many of the scripts have begun to fail. I have one that checks the volume status, and now with the way it returns results from a grep statement, the scripts report the volumes don't exist anymore. Scary :)
This is specific to 8.0x release. I first thought it was platform specific ( we just installed a bunch of 3270s) and this week we upgrade some older 3070s to 8.0x and they too began behaving this way.
-net
On Tue, Jun 14, 2011 at 1:16 PM, A Darren Dunham ddunham@taos.com wrote:
On Tue, Jun 14, 2011 at 08:55:40AM -0700, Sto Rage? wrote:
I had the same problem with a set of new filers all running 8.0 code. Searching NOW I found this KB https://kb.netapp.com/support/index?page=content&id=2013198 Indeed on the ONTAP 8.0x systems the default value for ssh.idle.timeout is set to 0 whereas on all 7.x filers it was set to 600. After changing it on the 8.0x filers, things seems to be working OK. Hope this helps
I will agree that a lot of these filers I'm seeing it on had the idle set to "0", but changing it to "600" didn't seem to change anything. I can still reproduce it with that change.
-- Darren
Like I said earlier, for anybody encountering this, please open a case and have them add your case #, company name, etc to the call_rec in the BURT.
Please also spell out, (and have them note in the burt) any specifics like whether it's the output in the middle or end or random/multiple chunks that's missing.
Peter
-----Original Message----- From: Sto Rage© [mailto:netbacker@gmail.com] Sent: Tuesday, June 14, 2011 3:01 PM To: A Darren Dunham Cc: toasters@mathworks.com Subject: Re: Truncated SSH ouptut on 8.x?
Yeah, I spoke too soon ;( Initially they seemed to work fine but not anymore. This is a serious bug, I mean many of the scripts have begun to fail. I have one that checks the volume status, and now with the way it returns results from a grep statement, the scripts report the volumes don't exist anymore. Scary :)
This is specific to 8.0x release. I first thought it was platform specific ( we just installed a bunch of 3270s) and this week we upgrade some older 3070s to 8.0x and they too began behaving this way.
-net
On Tue, Jun 14, 2011 at 1:16 PM, A Darren Dunham ddunham@taos.com wrote:
On Tue, Jun 14, 2011 at 08:55:40AM -0700, Sto Rage? wrote:
I had the same problem with a set of new filers all running 8.0 code. Searching NOW I found this KB https://kb.netapp.com/support/index?page=content&id=2013198 Indeed on the ONTAP 8.0x systems the default value for ssh.idle.timeout is set to 0 whereas on all 7.x filers it was set to 600. After changing it on the 8.0x filers, things seems to be working OK. Hope this helps
I will agree that a lot of these filers I'm seeing it on had the idle set to "0", but changing it to "600" didn't seem to change anything. I can still reproduce it with that change.
-- Darren
On Tue, Jun 14, 2011 at 04:19:13PM -0700, Learmonth, Peter wrote:
Like I said earlier, for anybody encountering this, please open a case and have them add your case #, company name, etc to the call_rec in the BURT.
Please also spell out, (and have them note in the burt) any specifics like whether it's the output in the middle or end or random/multiple chunks that's missing.
Peter
Just saw that 8.0.2P1 is showing first fix for 485715. I haven't tested it yet, but I will be pulling it down and trying it pretty soon.