On the off chance - I'm having trouble with a filer. I can't ssh to it reliably (at all, mostly). 

I'm pretty sure that's correlated with some high CPU load  - my system console has it 'spiked' at >95% for the last 24h, and that's much higher than 'normal'. 

What i'm not sure of is quite what's causing it - the filer is busy, but not abnormally so. 

The only thing I can think of that _might_ have changed it, is api calls (qtree-list, get-file-info) - I've recently started doing quota snmp trap enrichment. (but thats 'every few minutes' at most). 

But otherwise - I'm not sure what might be causing sshd to stall, and if there's a way to 'kick' it? 

This is a 7 mode filer, on 8.2.1

I've got a case open, but would appreciate any further insight on how to track a high CPU-causing ssh to not respond type issue. 

I'm pretty sure a failover/failback will do the trick, but that'll have to wait until the weekend - I'd like not to if I can manage it.

My current ps list looks like:

Process statistics over 67.328 seconds...

   ID State Domain %CPU StackUsed %StackUsed Name

  195 RR    N       47%      6928        10% NwkThd_00

  196 RR    N       47%      7880        12% NwkThd_01

  197 RR    0       47%      6928        10% NwkThd_02

  223 BR    s        7%      7648        46% pmcsas_intrd_1

  259 BR    e        5%      2440        19% fal_io_thread2

  502 BR    R        7%      7448        45% raidio_thread

  503 BR    R        7%      7448        45% raidio_thread

  635 BG    k        6%     15184        11% snmpd

 1614 BR    0        5%      3464        10% ntm_main

 1711 RR    w       35%     14256        21% wafl_exempt00

 1712 BR    w       35%     14136        21% wafl_exempt01

 1713 BR    w       35%     14136        21% wafl_exempt02

 2599 BR    k        5%      2752         8% gr_scheduler


That seems pretty busy for a 4cpu system...



Thanks and regards,
Ed.