We have an F760C that seems to have recently run into a wall in regards to performance. As you can see, there total NFS ops (its purely NFS over UDP) are tiny for what this class of filer can handle, the net in/out isn't especially high, the disks are doing about half of what they can, but you can see the CPU is pegged and the Consistency Point type is B (back to back CPs), which I seem to remember was very bad.
filer> version NetApp Release 6.1R1P1: Wed Jun 20 18:24:25 PDT 2001
filer> sysstat -u 1
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 100% 12 1777 10 0 0 0 0 7 100% 100% : 0% 100% 23 431 7 2094 4955 0 0 7 100% 100% : 48% 100% 2 499 7 359 865 0 0 7 100% 100% : 15% 100% 6 551 5 0 0 0 0 7 100% 100% : 0% 100% 16 665 7 40 16 0 0 7 100% 100% : 2% 100% 139 1746 13 16 0 0 0 7 100% 100% : 2% 100% 95 461 26 1840 1892 0 0 7 100% 100% B 41% 100% 321 3505 32 636 1152 0 0 7 99% 100% : 21% 100% 274 5380 63 810 1772 0 0 7 100% 100% : 31% 100% 200 1313 39 2732 5893 0 0 7 98% 100% : 44% 100% 361 590 47 3750 5205 0 0 7 100% 100% B 58% 100% 315 7011 173 6491 10557 0 0 7 100% 100% B 51% 100% 483 6314 144 4125 10239 0 0 7 99% 100% : 57% 100% 429 3544 179 4453 7540 0 0 7 100% 100% B 40% 100% 422 6462 138 3351 4313 0 0 7 100% 100% B 40% 100% 504 7227 143 3866 7226 0 0 7 100% 100% B 49% 100% 509 6933 164 3285 8902 0 0 7 100% 100% B 47% 100% 397 6853 161 3511 8631 0 0 7 100% 100% B 41% 100% 342 4527 202 3123 8722 0 0 7 99% 100% B 38% CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 100% 450 5346 198 2145 9383 0 0 7 98% 100% : 41% 100% 537 5397 196 2589 4091 0 0 7 99% 100% B 43% 100% 355 7152 376 5053 5944 0 0 7 98% 100% B 47% 100% 390 6693 381 3788 8575 0 0 7 98% 100% B 63% 100% 462 5748 212 2755 8926 0 0 7 98% 100% B 41% 100% 390 5289 206 2936 8896 0 0 7 99% 100% B 40% 100% 408 6120 217 4313 10208 0 0 8 98% 100% : 40% 100% 420 5651 165 3499 5997 0 0 8 99% 100% B 38% 100% 497 7152 1911 4138 3987 0 0 8 100% 100% B 33% 100% 177 6577 647 4195 8403 0 0 8 99% 100% B 61% 100% 491 5549 192 4023 9042 0 0 8 99% 100% B 52% 100% 483 4760 246 2192 9256 0 0 8 98% 100% : 54% 100% 454 6273 282 2125 9104 0 0 8 100% 100% B 43% 100% 419 6278 624 1977 6597 0 0 8 100% 100% B 36% 100% 437 7082 148 2293 4436 0 0 8 100% 100% B 44% 98% 533 6814 166 1964 9181 0 0 8 96% 100% B 45% 100% 451 7486 399 2396 7076 0 0 8 99% 100% B 47% 100% 361 4137 126 36 64 0 0 8 100% 100% : 4% 100% 164 5411 63 32 0 0 0 8 99% 100% : 2% 100% 135 1293 25 40 16 0 0 8 100% 100% : 3% CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 100% 113 693 29 242 5297 0 0 8 99% 100% : 17% 88% 100 557 28 1740 5399 0 0 8 93% 100% : 48% 92% 363 2982 66 5628 9746 0 0 8 100% 100% B 42% 69% 245 6641 94 4992 4999 0 0 8 99% 100% B 50% 80% 256 8562 98 6633 11136 0 0 8 98% 100% B 52% 96% 316 5253 97 3594 9801 0 0 8 100% 100% : 54% 80% 484 4006 89 3123 5641 0 0 8 100% 100% F 46% 79% 303 8004 142 7064 8163 0 0 8 99% 100% F 59% 86% 452 7324 124 7658 10672 0 0 8 92% 100% F 75% 96% 487 3194 250 5769 10395 0 0 8 99% 100% : 58% 87% 564 7267 145 5295 7299 0 0 8 95% 100% F 53% 53% 114 8385 108 6838 4198 0 0 8 98% 100% B 36% 97% 520 3170 38 7972 10447 0 0 8 99% 100% : 62% 78% 755 5875 2667 2913 3823 0 0 8 100% 100% B 50% 83% 304 8232 129 7845 8444 0 0 8 99% 100% B 60% 88% 607 7104 101 7272 10302 0 0 8 99% 100% B 65% 91% 478 4897 148 5553 11166 0 0 8 99% 100% : 61% 89% 424 5359 124 5303 6008 0 0 8 99% 100% B 53% 82% 360 8128 110 6587 6664 0 0 8 99% 100% B 64%
There are four DLT7000 tape drives connected to the filer, and there is one backup session going to the filer at this time; for some reason, the filer does not report when writing data from a Solaris client using the attached tape drive (we're using QuickRestore 2.7.9 if that makes any difference).
I'm tempted to say that it's just a problem with not enough NVRAM on the filer (F760 has 32 MB of NVRAM, and I think it gets split when you're talking about a cluster). Does anyone else have any ideas or suggestions?
Geoff Hardin UNIX System Administrator