100% CPU utilization on "bored" IBM N6240E21 (FAS3240C)

16 Oct 2012


      Hello all;
A week or so ago, our N6240E21 (FAS3240C) started reporting 100% CPU
utilization.  Graphs from our monitoring system show the typical random
peaks and valleys averaging around 20-30% utilization, then suddenly a
plateau at 100% lasting for an entire week (and still ongoing).
The weird thing is -- the filer really *isn't* that busy:
red-str-napc1-p2> sysstat -x -c 10 1
 CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s
                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out
100%   1489    530      0    2019   13174  10363   14636  11588       0      0     4s    84%   20%  :    21%       0      0      0       0      0       0      0
100%   2001    636      0    2637   29590  20895   25112  14140       0      0     4s    77%   15%  Hn   16%       0      0      0       0      0       0      0
100%   1140    608      0    1748   11829   6349   14396  58396       0      0     4s    95%  100%  :f   22%       0      0      0       0      0       0      0
 99%   3703    404      0    4107   38005 108512  164912  23204       0      0     4s    65%   46%  :    60%       0      0      0       0      0       0      0
100%   1429    195      0    1627   15986  10483   23296  53132       0      0     3s    93%   37%  Hn   25%       3      0      0       0      0       0      0
100%   1440     35      0    1475   32821  11302   16796  35488       0      0     3s    91%   68%  :    20%       0      0      0       0      0       0      0
100%   1461      0      0    1461   10467   8030    7912     32       0      0     3s    75%    0%  -    23%       0      0      0       0      0       0      0
100%   1845    280      0    2125   28710  17652   19624  12636       0      0     3s    83%   17%  Hn   24%       0      0      0       0      0       0      0
100%   2070    191      0    2261    6911  20148   23964  80048       0      0     3s    94%   66%  :    19%       0      0      0       0      0       0      0
100%   1477    153      0    1633   29005   8196    8536     24       0      0     3s    76%    0%  -    12%       3      0      0       0      0       0      0
red-str-napc1-p2*> sysstat -m -c 10 1
 ANY  AVG  CPU0 CPU1 CPU2 CPU3
 47%  65%   85%  70%  74%  30%
 74%  76%   87%  78%  83%  56%
 47%  65%   85%  70%  75%  31%
 59%  68%   84%  71%  76%  41%
 49%  66%   85%  72%  76%  32%
 50%  68%   83%  77%  79%  33%
 56%  70%   87%  76%  79%  37%
 47%  66%   85%  72%  76%  29%
 29%  62%   86%  70%  75%  16%
 36%  64%   86%  72%  76%  23%
red-str-napc1-p2*> priv set advanced
red-str-napc1-p2*> ps -c 5
Process statistics over 1218393.619 seconds...
   ID State Domain %CPU StackUsed %StackUsed Name
    5 RR    i       76%      1016        24% idle_thread0
    6 RR    i       76%       904        22% idle_thread1
    7 RR    i       75%       904        22% idle_thread2
    8 RR    i       64%      1024        25% idle_thread3
   89 BG    1        5%      4440         6% NwkThd_01
  108 RR    2        7%      1944         5% 10GbE/e1b
  294 BR    r       12%      4104        25% raidio_thread
 1539 BR    w        6%      5736        17% wafl_exempt_0
 1540 BR    w        6%      5736        17% wafl_exempt_1
 1541 BR    w        6%      5736        17% wafl_exempt_2
 1544 BR    k        9%     13720        41% wafl_lopri
(A few other processes are >0%, but these are the most notable).
We are running on ONTAP 8.0.2P3 (7-mode) and the filer is primarily
doing NFS for VMware datastores with a bit of CIFS sharing mixed in.
We have opened a support case with IBM, but so far they are telling us
that this may be "normal".  They're still helping us investigate, so we
may yet get something from that route, but wanted to throw this out
here because this certainly doesn't seem normal.
I'm guessing a controller reboot would solve the problem, but would
like to see if there is an alternative or an explanation first.
This thread[1] seems similar, but there wasn't really a resolution.
Thanks,
Ray
[1] https://communities.netapp.com/thread/14321

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

100% CPU utilization on "bored" IBM N6240E21 (FAS3240C)