Our new F740 has a pretty constant load (`sysstat 10`) thusly:
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 46% 1114 1131 0 512 1331 701 371 0 0 5 53% 1559 1316 0 595 1615 446 387 0 0 6 50% 1343 1275 0 596 1211 489 464 0 0 5 48% 1454 1283 0 544 1272 276 395 0 0 6 52% 1844 1275 0 651 1837 280 320 0 0 6 58% 1399 1551 0 1027 2787 1037 450 0 0 6 56% 1796 1351 0 763 1541 352 378 0 0 6 45% 1024 1349 0 620 1422 330 542 0 0 6 58% 1657 1353 0 1193 2991 620 754 0 0 6 51% 1171 1467 0 598 1424 272 453 0 0 6
with occasional (every 5-10 minutes) sustained peaks of:
77% 1242 1201 0 489 1731 5285 5049 0 0 6 81% 1497 1279 0 600 2048 4327 5026 0 0 5 78% 1495 1221 0 563 1589 4287 5488 0 0 4 76% 1229 1213 0 472 1795 4546 5262 0 0 3 81% 1467 1150 0 1345 2634 3683 5746 0 0 2 80% 969 1068 0 397 2051 3876 6298 0 0 2 91% 1049 905 0 384 1102 1230 10622 0 0 2 92% 1328 942 0 448 1801 1536 10336 0 0 2
Are there any rules of thumb about when I should start worrying? I've got only about 2/3 of the data on it and the same fraction of clients connected to it that I need. The low cache numbers already worry me, but it's got the stated max of 1/2 GB.
Max CPU and nfs-ops/sec I've seen is:
86% 3148 563 0 4228 5598 4131 3845 0 0 1 93% 4425 452 0 3573 4503 3283 3977 0 0 1 92% 3320 437 0 6299 4711 3778 5573 0 0 1 95% 3827 552 0 5282 5195 4094 5663 0 0 1
at which point I imagine the robot from "Lost In Space" coming out flailing its arms, if there existed the option "danger.will_robinson on".
(Basic config is two 7 drive shelves of 18 GB FC-AL disks in a single RAID group with one spare and one volume and a quad 100 Mb/s board trunked into a 400 Mb/s fd pipe to the switch. I think I overspec'ed the network connection, though I have seen Net kB in and out burst and exceed 10,000 each, almost the limit of a single 100 Mb/s pipe).
Until next time...
Todd C. Merrill The Mathworks, Inc. 508-647-7792 24 Prime Park Way, Natick, MA 01760-1500 508-647-7012 FAX tmerrill@mathworks.com http://www.mathworks.com ---
On Thu, 18 Feb 1999, Todd C. Merrill wrote:
with occasional (every 5-10 minutes) sustained peaks of:
77% 1242 1201 0 489 1731 5285 5049 0 0 6 81% 1497 1279 0 600 2048 4327 5026 0 0 5 78% 1495 1221 0 563 1589 4287 5488 0 0 4 76% 1229 1213 0 472 1795 4546 5262 0 0 3 81% 1467 1150 0 1345 2634 3683 5746 0 0 2 80% 969 1068 0 397 2051 3876 6298 0 0 2 91% 1049 905 0 384 1102 1230 10622 0 0 2 92% 1328 942 0 448 1801 1536 10336 0 0 2
The inbound network traffic (less than 1MB/sec) doesn't jive with your disk writes (5-10MB/sec). Are these not consecutive lines of output, or are you recovering from a failed drive, or what?
On Thu, 18 Feb 1999, Todd C. Merrill wrote:
with occasional (every 5-10 minutes) sustained peaks of:
77% 1242 1201 0 489 1731 5285 5049 0 0 6 81% 1497 1279 0 600 2048 4327 5026 0 0 5 78% 1495 1221 0 563 1589 4287 5488 0 0 4 76% 1229 1213 0 472 1795 4546 5262 0 0 3 81% 1467 1150 0 1345 2634 3683 5746 0 0 2 80% 969 1068 0 397 2051 3876 6298 0 0 2 91% 1049 905 0 384 1102 1230 10622 0 0 2 92% 1328 942 0 448 1801 1536 10336 0 0 2
The inbound network traffic (less than 1MB/sec) doesn't jive with
your disk writes (5-10MB/sec).
This suggests some sort of meta-data updates, e.g., 'touch *' though in that example inode locality should be pretty good and thus you'd not see too many blocks written per operation.
Hmmm ... I just noticed that you've got CIFS as well as NFS. Some MicroSoft applications (e.g., Word or Excel) treat their files like databases. If you didn't have any CIFS client write caching (due to op locks being disabled or revoked on account of multiple accesses to the file) then saving one of these files might product this sort of pattern.
It would be helpful to try to isolate whether NFS or CIFS is causing this pattern. If NFS, do an nfsstat at the beginning and end of one of these fits then compare them to see what's being done. If CIFS is the culprit, I'll have to defer further diagnosis to someone else as it's been several years since I looked at our CIFS instrumentation very closely.
-- Karl