SNMP CPU usage in "TIMETICKS" is interpreted as uptime and corrupts logfile - toasters

27 Apr 1998


      I'm trying to use mrtg to show the CPU utilization on a Network Appliance.
Here's the relevant portion of the netapp MIB (output of an SNMPwalk, with
my notes on the right side):
1.3.6.1.4.1.789.1.2.1.1.0: TIMETICKS: 1016121072  (uptime in ticks {1/100sec})
1.3.6.1.4.1.789.1.2.1.2.0: TIMETICKS: 33513506    (cpu ticks used since uptime)
1.3.6.1.4.1.789.1.2.1.3.0: INTEGER: 3             (% cpu util since uptime)
1.3.6.1.4.1.789.1.2.1.4.0: TIMETICKS: 982607568   (cpu idle ticks since uptime)
1.3.6.1.4.1.789.1.2.1.5.0: INTEGER: 97            (% idle since last boot)
1.3.6.1.4.1.789.1.2.2.1.0: INTEGER: 311445413     (nfs ops since boot)
1.3.6.1.4.1.789.1.2.2.2.0: INTEGER: 4503517 (total kbytes recd on all nets)
1.3.6.1.4.1.789.1.2.2.3.0: INTEGER: 1224614 (total kbytes trans since boot)
I can't use the "% cpu util since uptime" figure because that's just
an average cpu utilization since the last reboot, which isn't particularly
informative when your server is usually up for months at a time.
So I figured I'd use the "cpu ticks used since uptime" and
"cpu idle ticks since uptime" MIB entries.
The problem is that SNMP entry is of type "TIMETICKS", which is
interpreted by the BER.pm module as an "uptime statistic".  When I run
that one, here's what the mrtg logfile looks like afterwards:
893639038 9 days, 8:48:06 9 days, 8:48:06
893639038 0 0 0 0
893638500 0 0 0 0
893638200 0 0 0 0
Ouch!  The second time I run mrtg it fails with a "corrupt logfile"
error (for good reason, that's corrupt all right!).
Here's the output of mrtg with the DEBUG level set at 8:
SNMPGET OID: 1.3.6.1.2.1.1.3.0
SNMPGET OID: 1.3.6.1.2.1.1.5.0
1.3.6.1.4.1.789.1.2.1.4.0&1.3.6.1.4.1.789.1.2.1.4.0:public@fs11 --> in: 9 days, 8:48:06  out: 9 days, 8:48:06  name: fs11.nas.nasa.gov
getting SNMP variables for target: 1:public@fs12
snmpget: ifInOctets.1 ifOutOctets.1 fs12 public
SNMPGET OID: 1.3.6.1.2.1.2.2.1.10.1
SNMPGET OID: 1.3.6.1.2.1.2.2.1.16.1
Now, here's the part of the BER.pm source code which decodes the
SNMP reply:
sub pretty_print ($) {
    my ($packet) = @_;
    my ($type,$rest);
    my $result = ord (substr ($packet, 0, 1));
    return pretty_intlike ($packet)
        if $result == int_tag;
    return pretty_unsignedlike ($packet)
        if $result == snmp_counter32_tag
            || $result == snmp_gauge32_tag;
    return pretty_string ($packet) if $result == octet_string_tag;
    return pretty_oid ($packet) if $result == object_id_tag;
###THIS LINE
    return pretty_uptime ($packet) if $result == uptime_tag;
###THIS LINE
    return pretty_ip_address ($packet) if $result == snmp_ip_address_tag;
    return "(null)" if $result == null_tag;
    return "#<unprintable BER type $result>";
}
The "pretty_uptime" subroutine is about what you'd expect, converting
the ticks to "up for 4 days 3 hours..." format.
So, if the result is an "uptime_tag" (which probably means the
same thing as TIMETICKS), the BER.pm module formats it as an uptime
string before it even gives it to mrtg.
That's exactly what we want for the "this router has been up for ...."
mesages on our webpages, but makes Netapp CPU information unavailable
(except for the uptime since the last boot).
I really want to avoid maintaining homegrown customizations to mrtg.
I'd prefer a solution to be incorporated into the main mrtg source
code.  Here are some suggestions on a fix:
1) Have mrtg spawn a separate script which uses snmpget, and then
   reports this information as an integer (Peter Buschman recently
   send such a script to the toaster's alias...I was wondering why he
   didn't just have mrtg get the data directly...now I know).
2) Have the NetApp MIB report CPU ticks as an INTEGER or COUNTER instead
   of TIMETICKS
3) Have BER.pm report uptimes as an integer, and have the mrtg perl
   script do the "pretty_uptime" conversion if appropriate (ie. for
   "router uptime" information at the top of the webpage, but
   not for data put in graphs).
4) ??? Any suggestions anyone?
I'm probably going to go with #1 unless someone has a better idea, but
I thought this would be of sufficient interest that I'd spam you guys
with it.
Darrell Root
rootd@nas.nasa.gov