The CP to CP command is:
wafl_susp -w
Look at cp_from_log_full and cp_from_cp. I think they mean:
How many times the checkpoint happened (before 10 sec. delay) because the log in NVRAM was full (this can mean the filer writes too much, but don't worry TOO much - it can be even
100,000).
The 2nd one I cannot remember... :(
Eyal.
"Todd C. Merrill" wrote:
On Mon, 15 May 2000, Jeff Stampes wrote:
So we use a F760 for compiling our software....it's a demanding environment, and up until now, NetApp systems have been adequate for us
This is exactly how I use them. If you want more detail than
provided here, feel free to email me directly.
As I've been refining our build process though, I've continuing optimizing it to the point the 760 may become our bottleneck. It seems to be running at about 50-60% of CPU most of the time, with peaks like:
[...]
What stats will cue me into a problem related to overload?
Very low cache ages are bad. We used to have F740's, and would have
cache ages of 0 or 1 most of the time. We bumped up the memory in an F740 to 1 GB (unsupported) and got about another 10-20% out of it before we then pegged the CPU again at cache ages of 2 or 3. We upgraded to F760s after that and now are running about:
filer> sysstat 2 CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 80% 1829 1848 0 1368 2047 5862 10969 0 0 2 64% 2636 1606 0 2233 4878 4922 0 0 0 2 67% 2550 2383 0 2390 2787 2609 0 0 0 2 74% 3420 2167 0 4027 4103 3253 0 0 0 2 85% 3132 1305 0 2761 3409 8241 7526 0 0 2 72% 3546 1443 0 1963 2898 5958 4670 0 0 2 63% 3147 1839 0 2082 2956 3327 0 0 0 2
There was also a thread back about measuring the frequency of
consistency points that happen more often than once/10 secs. Search back and find that (I don't remember the command nor the parameter off hand). We were not limited by that; most of our CPs were happening on the 10 second boundaries.
There are some great white papers on the NOW site about NFS
performance and RAID group performance:
http://www.netapp.com/tech_library/3008.html http://www.netapp.com/tech_library/3027.html
Take a look at those to ensure you are not running into some of the limits mentioned there. The 3027 paper is way out of date (is uses F330's as examples) but the shape of the curves should not have changed much.
What are the limits of CPU/Net/Disk use I should stay under?
I've found 70-75% CPU load doesn't really impact our
environment. Beyond that, the filer seems overloaded, latencies increase, our builds take longer, etc. We can push our F760's to 6,000
- 7,000 ops/s for minutes at a time pretty easily. You seem to be
limited to only 4,000 - 5,000 ops/s from the small snippet given (which seemed to be your peak usage, not average). This just may be differences in our environments. (We run about 2:1 NFS/CIFS operations, use NFS stricly over UDP, and mostly NFS v2 now).
Provided I find a way to drive an F760 into the ground, any suggestions for a next step?
We parallelized with multiple filers. We used to have one F740,
then two, then upgraded those to two F760's now, soon to be three, and have an F8xx upgrade on order for one of them. Oooo, can't wait for that puppy! (Sorry...under NDA...can't say anything about the specs.)
One note, however...you say you see these high peaks of ~100%
CPU load and lots of writes. I've seen usage that heavy on ours when snapshots occur. See if that correlates with your snap schedule. This is what ours looks like when a snapshot is being created. Ops go way down and disk read/writes go way up:
filer> sysstat 2 CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 78% 3069 1100 0 1342 1375 6865 8036 0 0 3 82% 3775 1322 0 1507 1433 7076 4516 0 0 3 80% 3932 1009 0 1317 1281 6255 8328 0 0 3 91% 3240 1252 0 2374 3519 10230 7642 0 0 2 89% 2856 1306 0 2468 1469 8904 9084 0 0 2 74% 2293 424 0 665 2658 9403 9854 0 0 2 75% 3223 457 0 928 1650 9456 7053 0 0 2 83% 2949 438 0 1428 4242 9464 10605 0 0 2 77% 2355 595 0 765 3114 11063 9504 0 0 2 73% 1927 574 0 684 854 10023 9662 0 0 2 75% 2692 896 0 1134 1508 8676 10477 0 0 2
Oh, we have quad, trunked 100 Mb/s ethernet with a virtual
interface; apparently, the gigabit cards are faster, and the later gig cards are faster than the older ones.
I've also signed up for the new-and-improved NetApp 202 class;
hopefully, I can learn more about the filer and really know what I'm seeing and measuring in order to tune it better for our environment.
Hope this helps. Until next time...
The Mathworks, Inc. 508-647-7000 x7792 3 Apple Hill Drive, Natick, MA 01760-2098 508-647-7001 FAX tmerrill@mathworks.com http://www.mathworks.com