On 04/27/98 02:48:22 you wrote:
>Oddly, out of our 8 netapps, the only one that hasn't had a RAM or
>Disk failure is the one with Kingston RAM, Dallas Semiconductor NVRAM
>and a number of Seagate drives, all sourced as 'third party'.
>
>Of our 4 new f230's we got about 5-6 months ago, we've had one dead
>NVRAM board, about 2 with bad RAM and I believe 2 ill disks. These
>are just off the top of my head and possibly not counting the DOA RAM
>issues and another dead …
[View More]NVRAM board I seem to remember hearing about.
>Most of these were barely even cracked open by us since receiving them
>from NetApp but the problems persisted even after reseating, etc.
>
>Humourously, the ones with bad RAM sure worked swell when we stuffed
>them with cheap Kingston RAM while waiting for the NetApp replacement
>'Heavily Tested and Approved' RAM was being shipped.
>
>We paid extra for this?
Yep. For better quality when it leaves Netapp than the parts you get
third party.
Sounds very much like a shipping issue to me. I'd talk to Netapp
support and have them coordinate with shipping to see if that's
where the problems are. Given when you say these things occurred,
it's quite possible the problems have already been corrected.
Perhaps have Netapp ship the seperately, requiring you to install it
just like you do third-party RAM, might be a better solution for your
given environment.
Bruce
[View Less]
On 04/16/98 15:48:18 you wrote:
>
>On Thu, 16 Apr 1998, Weeks, Thomas wrote:
>
>> RAM on the filer is being run to the very MAX of its specs
>> (maybe more)... I hear that they have a fairly high failure rate on RAM
>> that would otherwise work FINE in slower, less demanding systems (PC's).
>
>'scuse me??? Really? I find this somewhat hard to believe...
>
>60ns RAM is widely available, are you saying they're pushing the envelope
>on that?
>
>I …
[View More]very much doubt the memory architecture on a NetApp is anything more
>sophisticated than Sun's memory architecture (and I almost guarantee Sun's
>memory architecture blows away low-end NetApps)...and Sun seem able to use
>fairly standard parts without too much failure.
>
>Maybe I'm just missing the big picture...
Part of it is history. Back in the "old days", which is to say only 2-3 years
ago in computer time, there were lots of vendors you could get 60ns RAM from
that weren't quite up to snuff under heavy loads. And those vendors that did
often produced RAM that stood up to their tests, but not to Netapp's. As
the industry has progressed, I'm sure today's 60ns RAM chips are of higher
quality, so failures may be less common than they once were.
But yes, Netapp seriously pushes the performance of memory under high load,
even if that is now more easily within the tolerances of most memory chips.
Another thing to consider is that if you don't run your filers under very
high load, it's probably very easy for you to "get away" with lower-quality
memory. But don't be surprised if your filer starts getting memory errors
once you start running your CPU at 100%, or your cache age at 1, or you have
a disk failure and have to do reconstruction while in service, etc.
Bruce
[View Less]
I'm trying to use mrtg to show the CPU utilization on a Network Appliance.
Here's the relevant portion of the netapp MIB (output of an SNMPwalk, with
my notes on the right side):
1.3.6.1.4.1.789.1.2.1.1.0: TIMETICKS: 1016121072 (uptime in ticks {1/100sec})
1.3.6.1.4.1.789.1.2.1.2.0: TIMETICKS: 33513506 (cpu ticks used since uptime)
1.3.6.1.4.1.789.1.2.1.3.0: INTEGER: 3 (% cpu util since uptime)
1.3.6.1.4.1.789.1.2.1.4.0: TIMETICKS: 982607568 (cpu idle ticks since uptime)
1.3.…
[View More]6.1.4.1.789.1.2.1.5.0: INTEGER: 97 (% idle since last boot)
1.3.6.1.4.1.789.1.2.2.1.0: INTEGER: 311445413 (nfs ops since boot)
1.3.6.1.4.1.789.1.2.2.2.0: INTEGER: 4503517 (total kbytes recd on all nets)
1.3.6.1.4.1.789.1.2.2.3.0: INTEGER: 1224614 (total kbytes trans since boot)
I can't use the "% cpu util since uptime" figure because that's just
an average cpu utilization since the last reboot, which isn't particularly
informative when your server is usually up for months at a time.
So I figured I'd use the "cpu ticks used since uptime" and
"cpu idle ticks since uptime" MIB entries.
The problem is that SNMP entry is of type "TIMETICKS", which is
interpreted by the BER.pm module as an "uptime statistic". When I run
that one, here's what the mrtg logfile looks like afterwards:
893639038 9 days, 8:48:06 9 days, 8:48:06
893639038 0 0 0 0
893638500 0 0 0 0
893638200 0 0 0 0
Ouch! The second time I run mrtg it fails with a "corrupt logfile"
error (for good reason, that's corrupt all right!).
Here's the output of mrtg with the DEBUG level set at 8:
SNMPGET OID: 1.3.6.1.2.1.1.3.0
SNMPGET OID: 1.3.6.1.2.1.1.5.0
1.3.6.1.4.1.789.1.2.1.4.0&1.3.6.1.4.1.789.1.2.1.4.0:public@fs11 --> in: 9 days, 8:48:06 out: 9 days, 8:48:06 name: fs11.nas.nasa.gov
getting SNMP variables for target: 1:public@fs12
snmpget: ifInOctets.1 ifOutOctets.1 fs12 public
SNMPGET OID: 1.3.6.1.2.1.2.2.1.10.1
SNMPGET OID: 1.3.6.1.2.1.2.2.1.16.1
Now, here's the part of the BER.pm source code which decodes the
SNMP reply:
sub pretty_print ($) {
my ($packet) = @_;
my ($type,$rest);
my $result = ord (substr ($packet, 0, 1));
return pretty_intlike ($packet)
if $result == int_tag;
return pretty_unsignedlike ($packet)
if $result == snmp_counter32_tag
|| $result == snmp_gauge32_tag;
return pretty_string ($packet) if $result == octet_string_tag;
return pretty_oid ($packet) if $result == object_id_tag;
###THIS LINE
return pretty_uptime ($packet) if $result == uptime_tag;
###THIS LINE
return pretty_ip_address ($packet) if $result == snmp_ip_address_tag;
return "(null)" if $result == null_tag;
return "#<unprintable BER type $result>";
}
The "pretty_uptime" subroutine is about what you'd expect, converting
the ticks to "up for 4 days 3 hours..." format.
So, if the result is an "uptime_tag" (which probably means the
same thing as TIMETICKS), the BER.pm module formats it as an uptime
string before it even gives it to mrtg.
That's exactly what we want for the "this router has been up for ...."
mesages on our webpages, but makes Netapp CPU information unavailable
(except for the uptime since the last boot).
I really want to avoid maintaining homegrown customizations to mrtg.
I'd prefer a solution to be incorporated into the main mrtg source
code. Here are some suggestions on a fix:
1) Have mrtg spawn a separate script which uses snmpget, and then
reports this information as an integer (Peter Buschman recently
send such a script to the toaster's alias...I was wondering why he
didn't just have mrtg get the data directly...now I know).
2) Have the NetApp MIB report CPU ticks as an INTEGER or COUNTER instead
of TIMETICKS
3) Have BER.pm report uptimes as an integer, and have the mrtg perl
script do the "pretty_uptime" conversion if appropriate (ie. for
"router uptime" information at the top of the webpage, but
not for data put in graphs).
4) ??? Any suggestions anyone?
I'm probably going to go with #1 unless someone has a better idea, but
I thought this would be of sufficient interest that I'd spam you guys
with it.
Darrell Root
rootd(a)nas.nasa.gov
[View Less]
> The numbers for cpu busy percent (1.3.6.1.4.1.789.1.2.1.3.0) aren't
> working so good. It keeps returning "48". Does anyone know why this
> would be? A 5 minute sysstat (sysstat 300) is showing more variation.
That value isn't an instantaneous snapshot of the busy percentage,
it's a busy percentage over the total running time of the filer.
To compute a busy percentage over a specific amount of time, you need
to sample the cpuBusyTime (.1.3.6.1.4.1.789.1.2.1.2.0)
and cpuIdleTime …
[View More](.1.3.6.1.4.1.789.1.2.1.4.0) values, and use the
changes in those values (over whatever time period you want)
to compute the busy percentage, i.e.:
( change_in_cpuBusyTime ) / ( change_in_cpuBusyTime + change_in_cpuIdleTime )
> The numbers for net usage (1.3.6.1.2.1.2.2.1.10.2 and
> 1.3.6.1.2.1.2.2.1.16.2) keep returing zeros. Should I be using a
> different set of numbers for an F220?
I'm not sure why they're returning zeros. (They're returning non-zero for
the machines, with more recent versions of ONTAP, that I've tried here.)
As an alternative to those numbers, you should be able to get the total KBytes
received and sent (since the last boot) from miscNetRcvdKB
(.1.3.6.1.4.1.789.1.2.2.2.0) and miscNetSentKB (.1.3.6.1.4.1.789.1.2.2.3.0).
...Tim Thompson...Network Appliance, Inc...tjt(a)netapp.com...
[View Less]
Nope :( Its a sony.
-chris
-----Original Message-----
From: Timothy A. McCarthy [mailto:tmac@netapp.com]
Sent: Friday, April 24, 1998 9:33 AM
To: Chris Fairbanks
Cc: 'toasters(a)mathworks.com'
Subject: Re: TBU
Well, if it happens to be an HP 1533A or HP 1553A then you are
in luck...it is supported now.
Anything else, I do not know....
Chris Fairbanks wrote:
> Does anyone know if netapp plans on supporting 4mm DDS-3 Dats? I have
> one lying around I would like to put on my filer.
…
[View More]>
> Thanks,
> Chris
--
Timothy A. McCarthy --> System Engineer, Eastern Region.
Network Appliance http://www.netapp.com
301-230-5840 Office \ / Page Me at:
301-230-5852 Fax \/ 800-654-9619
[View Less]
I feel a little dumb to be asking this but how does one actually get the NetApp
MIB into mrtg to be able to query and plot information about one's filers?
Currently I'm under the impression that I need to convert the fine NetApp MIB
in ASN.1 to the serialised integer format, to build into the mrtg script.
Now after much searching and thinking I've found a number of BER compilers
that'll do the job, outputting C/C++ tables, but nothing that just outputs
simple ascii strings.
Am I on the right …
[View More]track? Is there a cunning and clever way to do the job without
these steps? Has anyone done this before AND are willing to help me out/contrib
the work back to the mrtg site?
Any and all help sincerely appreciated!
[View Less]