On the off chance - I'm having trouble with a filer. I can't ssh to it
reliably (at all, mostly).
I'm pretty sure that's correlated with some high CPU load - my system
console has it 'spiked' at >95% for the last 24h, and that's much higher
than 'normal'.
What i'm not sure of is quite what's causing it - the filer is busy, but
not abnormally so.
The only thing I can think of that _might_ have changed it, is api calls
(qtree-list, get-file-info) - I've recently started doing quota snmp trap
enrichment. (but thats 'every few minutes' at most).
But otherwise - I'm not sure what might be causing sshd to stall, and if
there's a way to 'kick' it?
This is a 7 mode filer, on 8.2.1
I've got a case open, but would appreciate any further insight on how to
track a high CPU-causing ssh to not respond type issue.
I'm pretty sure a failover/failback will do the trick, but that'll have to
wait until the weekend - I'd like not to if I can manage it.
My current ps list looks like:
Process statistics over 67.328 seconds...
ID State Domain %CPU StackUsed %StackUsed Name
195 RR N 47% 6928 10% NwkThd_00
196 RR N 47% 7880 12% NwkThd_01
197 RR 0 47% 6928 10% NwkThd_02
223 BR s 7% 7648 46% pmcsas_intrd_1
259 BR e 5% 2440 19% fal_io_thread2
502 BR R 7% 7448 45% raidio_thread
503 BR R 7% 7448 45% raidio_thread
635 BG k 6% 15184 11% snmpd
1614 BR 0 5% 3464 10% ntm_main
1711 RR w 35% 14256 21% wafl_exempt00
1712 BR w 35% 14136 21% wafl_exempt01
1713 BR w 35% 14136 21% wafl_exempt02
2599 BR k 5% 2752 8% gr_scheduler
That seems pretty busy for a 4cpu system...
Thanks and regards,
Ed.