I am trying to (what else?) improve the performance of the filers that I have. Here are the configs and the environment in which they run. My specific questions follow:
I have two F740's with 14x 18GB FC-AL disks and a trunked 4x 100 Mb/s ethernet card. These filers serve software development sandboxes with a few million files (median file size about 4 kB), not more than a few thousand in the largest directories. These filers are used for both CIFS and NFS client builds; NFS traffic is limited to UDP only (`options nfs.tcp.enable off`) to keep the network overhead down on that protocol. Mix is about 60-75% NFS; the rest, CIFS. Running ONTAP 5.3.4R3P2. CPU load is more often than not above 50%, often 75-90% or more; usually 3,000 - 5,000 ops/; cache age is rarely above 3--most often 1 or 0. Consistency points are usually at the maximum of once every 10 seconds (as measured by disk writes with `sysstat`).
I was poking around on the SPEC pages, and in the footnotes to the NetApp entries are these parameters:
options nosnap 1 # to disable periodic snapshot creation, for reproducibility
options nosnapdir 1 # to avoid inserting .snapshot entry when reading directories
options raid.scrub.enable off # to disable periodic RAID scrubs, for reproducibility
options minra 1 # to minimize file read-ahead
options udp_lg_dgram.xmit_cksum.offload 1 # to offload checksum computations onto the Gigabit NIC
Openboot settings on the F740: setenv java? false # to disable the Java Virtual Machine
http://www.spec.org/osg/sfs97/results/res98q3/sfs97-980805-00008.html http://www.spec.org/osg/sfs97/results/res99q2/sfs97-19990416-00045.html
I have a few questions:
1a. Regarding snapshots, many of my development sandboxes are "disposable" so I don't care about snapshots. The first two options might help me. In addition, if I set `snap reserve volX 0`, will that actually turn off snapshots, and reduce overhead on the filer? The options only seem to turn off *automatic* snapshots and the *display* of the ".snapshot" directory.
1b. Would disabling these two options *prevent* me from `cd`ing into the .snapshot directory? Even though it is visible by default at the root of the mount point, you can `cd` into the directory ".snapshot" at any point:
37 machine:/devel/scratch>ls -a . .. .snapshot toddc 38 machine:/devel/scratch>cd toddc 39 machine:/devel/scratch/toddc>ls -a . Testing.doc devtools .. dump.image.gz 40 machine:/devel/scratch/toddc>cd .snapshot 41 machine:/devel/scratch/toddc/.snapshot>pwd /devel/scratch/toddc/.snapshot 42 machine:/devel/scratch/toddc/.snapshot>ls backup hourly.1 nightly.0 hourly.0 hourly.2
1c. In addition, I'm thinking of breaking up my 14-disk filer from a RAID 13d+1p configuration into, say, two RAID groups and two volumes as 3d+1p and 9d+1p. (Multiple RAID groups are allowed per volume; I assume multiple volumes per RAID group is still disallowed?) The former would have the stuff I need snapshotted and the latter the stuff I don't. I have a churn rate of about 30-40%. Thoughts?
2. What does "options minra 1" buy me? What is the number when minra=off, the default? (If it's just a boolean, nevermind!) Given I have a development environment with lots of small files, it seems as if turning this option on will benefit me a lot. However, some more information on this would help.
3. Does the udp_lg_dgram.xmit_cksum.offload option also apply to quad-ethernet cards? Will it help if the ports are trunked with a virtual interface?
4. Why is my filer running Java? If I disable that, would we see any administration impact? (These filers are administered solely via the CLI.) I do see the messages file logs
[Java Thread]: TimeDaemon: timed: adjusting time
messages all the time. Would I loose the ability to keep the filer in time sync? What else might I loose?
To prevent a knee-jerk reaction, for the purposes of this discussion, upgrading my filer to an F760 is not an option. ;) I want to leverage what I have now, as best I can.
I thought about opening a tech support call, and still may, but I thought some real life input from the trenches would be useful as a start.
Until next time...
The Mathworks, Inc. 508-647-7000 x7792 3 Apple Hill Drive, Natick, MA 01760-2098 508-647-7001 FAX tmerrill@mathworks.com http://www.mathworks.com ---
----- Original Message ----- From: "Todd C. Merrill" tmerrill@mathworks.com To: toasters@mathworks.com Sent: Tuesday, April 04, 2000 7:12 AM Subject: improving filer performance
I am trying to (what else?) improve the performance of the filers that I have. Here are the configs and the environment in which they run. My specific questions follow:
I have two F740's with 14x 18GB FC-AL disks and a trunked 4x 100 Mb/s ethernet card. These filers serve software development sandboxes with a few million files (median file size about 4 kB), not more than a few thousand in the largest directories. These filers are used for both CIFS and NFS client builds; NFS traffic is limited to UDP only (`options nfs.tcp.enable off`) to keep the network overhead down on that protocol. Mix is about 60-75% NFS; the rest, CIFS. Running ONTAP 5.3.4R3P2. CPU load is more often than not above 50%, often 75-90% or more; usually 3,000 - 5,000 ops/; cache age is rarely above 3--most often 1 or 0. Consistency points are usually at the maximum of once every 10 seconds (as measured by disk writes with `sysstat`).
I was poking around on the SPEC pages, and in the footnotes to the NetApp entries are these parameters:
options nosnap 1 # to disable periodic snapshot creation, for reproducibility
options nosnapdir 1 # to avoid inserting .snapshot entry when reading directories
options raid.scrub.enable off # to disable periodic RAID scrubs, for reproducibility
options minra 1 # to minimize file read-ahead
options udp_lg_dgram.xmit_cksum.offload 1 # to offload checksum computations onto the Gigabit NIC
Openboot settings on the F740: setenv java? false # to disable the Java Virtual Machine
http://www.spec.org/osg/sfs97/results/res98q3/sfs97-980805-00008.html http://www.spec.org/osg/sfs97/results/res99q2/sfs97-19990416-00045.html
I have a few questions:
1a. Regarding snapshots, many of my development sandboxes are "disposable" so I don't care about snapshots. The first two options might help me.
Yes, if you don't care about snapshots, you should turn it off with those options.
In addition, if I set `snap reserve volX 0`, will that actually turn off snapshots, and reduce overhead on the filer? The options only seem to turn off *automatic* snapshots and the *display* of the ".snapshot" directory.
No, that just sets the snapshot only reserve to 0. The snapshot reserve is a low-water mark, not a high-water mark; snapshots can and will exceed it, otherwise that space is reserved and not used by the filesystem. Note that snapshots will still be created for things like backups (which I think still work with nosnapdir on, but that's something to check).
1b. Would disabling these two options *prevent* me from `cd`ing into the .snapshot directory? Even though it is visible by default at the root of the mount point, you can `cd` into the directory ".snapshot" at any point:
I think it would, but I confess I'm not sure from the description given in the technical papers.
1c. In addition, I'm thinking of breaking up my 14-disk filer from a RAID 13d+1p configuration into, say, two RAID groups and two volumes as 3d+1p and 9d+1p. (Multiple RAID groups are allowed per volume; I assume multiple volumes per RAID group is still disallowed?) The former would have the stuff I need snapshotted and the latter the stuff I don't. I have a churn rate of about 30-40%. Thoughts?
You'd have to set the options differently for each volume. I'm not sure that this would help you very much; if you have stuff that you want snapshotted, it's probably the stuff that gets accessed a lot.
The "no snapshot" stuff's impact on performanc is EXTREMELY small. I doubt you will even notice it, especially if you are not going "whole hog" with turning it off.
- What does "options minra 1" buy me? What is the number when
minra=off, the default? (If it's just a boolean, nevermind!)
It's just a boolean. This means the filer will do minimal readahead.
Given I have a development environment with lots of small files, it seems as if turning this option on will benefit me a lot. However, some more information on this would help.
It might. It might not. Best thing to do is turn it on and see. You can always turn it back off later if it hurts.
- Does the udp_lg_dgram.xmit_cksum.offload option also apply to
quad-ethernet cards? Will it help if the ports are trunked with a virtual interface?
No, I don't think so. In fact, to improve your performance you might consider a gigabit card rather than using the trunked ethernet.
- Why is my filer running Java?
The Java engine is used for many auxilliary functions.
If I disable that, would we see any administration impact? (These filers are administered solely via the
CLI.)
I do see the messages file logs
Yes, it would. What all would die when Java is disabled is unclear; I haven't seen a complete list, and it could change with each release.
To prevent a knee-jerk reaction, for the purposes of this discussion, upgrading my filer to an F760 is not an option. ;) I want to leverage what I have now, as best I can.
Your best way to leverage what you have now is to ask for a discount on the F760 for the upgrade. The second best way is to just buy a new filer and move some of the data over to it, using both.
Bruce
Todd Merill tmerrill@mathworks.com asks a large number of questions about possible ways of improving filer performance. I will stick to the issues to do with snapshots here.
A general point first;
I was poking around on the SPEC pages, and in the footnotes to the NetApp entries are these parameters:
Several of these are specifically "for reproducibility". If a snapshot or a raid scrub goes off in the middle of your benchmark, it will rather noticably affect the results. This isn't a good reason for not using them in real life. You wouldn't let a real user near your system while doing a benchmark, either. :-)
1a. Regarding snapshots, many of my development sandboxes are "disposable" so I don't care about snapshots. The first two options might help me. In addition, if I set `snap reserve volX 0`, will that actually turn off snapshots, and reduce overhead on the filer? The options only seem to turn off *automatic* snapshots and the *display* of the ".snapshot" directory.
The right way to turn off automatic snapshots completely for a volume is to use "snap sched [volname] 0", and, of course, delete any existing snapshots. "vol options [volname] nosnap on" is, to my mind, a temporary override to the normal schedule, for when you are doing something strange to the volume in question (like running a benchmark!).
The "snap reserve" has no effect at all on the taking of snapshots. It's to stop the active filing system using space "reserved" (in a rather curious sense) for the snapshots. Nothing stops the snapshots eating into the remaining space you might reasonably think of as designated for the active filing system. If you aren't going to have any snapshots, then certainly you should lower the reserve value, probably to zero, to let the a.f.s. use more.
Taking a snapshot can cause a momentary glitch in perfomance, and degrade the cache contents somewhat. If you are in a marginal situation, you might want to avoid scheduling automatic snapshots to coincide with extreme peak usage times. But in practice it's the *space* occupied by snapshots that drives nearly all decisions taken about using them, not the expense of taking them.
As you say, the options mentioned prevent only the taking of automatic snapshots. The "dump" command (run against the a.f.s.) will still take a snapshot, for example. You wouldn't want it not to.
1b. Would disabling these two options *prevent* me from `cd`ing into the .snapshot directory? Even though it is visible by default at the root of the mount point, you can `cd` into the directory ".snapshot" at any point:
"vol options [volname] nosnapdir on" is enough to make all .snapshot directories (both the previously "visible" and "invisible" ones) inaccessible (and invisible) to NFS (and presumably CIFS, etc.). On the other hand, if you have no actual snapshots, then every such .snapshot directory would appear empty anyway.
1c. In addition, I'm thinking of breaking up my 14-disk filer from a RAID 13d+1p configuration into, say, two RAID groups and two volumes as 3d+1p and 9d+1p. (Multiple RAID groups are allowed per volume; I assume multiple volumes per RAID group is still disallowed?) The former would have the stuff I need snapshotted and the latter the stuff I don't. I have a churn rate of about 30-40%. Thoughts?
If you've got material with *markedly* different snapshot requirements, then multiple volumes is the way to go. It costs in the extra parity discs, of course. (I notice you have no hot spares mentioned in your alternative configurations above, so I guess you are already somewhat desperate for space.)
Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.
On Tue, 4 Apr 2000, Chris Thompson wrote:
1c. In addition, I'm thinking of breaking up my 14-disk filer from a RAID 13d+1p configuration into, say, two RAID groups and two volumes as 3d+1p and 9d+1p. (Multiple RAID groups are allowed per volume; I assume multiple volumes per RAID group is still disallowed?) The former would have the stuff I need snapshotted and the latter the stuff I don't. I have a churn rate of about 30-40%. Thoughts?
If you've got material with *markedly* different snapshot requirements, then multiple volumes is the way to go. It costs in the extra parity
Yes, that is the reason for my proposal to break up the single volume I have now. The churn rate of 30-40% on the entire filer is quite unnecessary; I only need to keep snapshots on a few of the development sandboxes; the other 10-20 don't need them. But, I must snapshot them all because they are all on the same volume currently.
However, it appears from a private email from a NetApp person that a 4 disk volume (3d+1p) is sub-optimally tuned, and could result in vastly decreased performance.
discs, of course. (I notice you have no hot spares mentioned in your alternative configurations above, so I guess you are already somewhat desperate for space.)
Whoops. Make that:
1c. In addition, I'm thinking of breaking up my 14-disk filer from a RAID 12d+1p+1hs configuration into, say, two RAID groups and two volumes as 3d+1p and 8d+1p + 1hs. ...
Until next time...
Todd C. Merrill The Mathworks, Inc. 508-647-7000 x7792 3 Apple Hill Drive, Natick, MA 01760-2098 508-647-7001 FAX tmerrill@mathworks.com http://www.mathworks.com ---
Todd C. Merrill tmerill@mathworks.com writes:
However, it appears from a private email from a NetApp person that a 4 disk volume (3d+1p) is sub-optimally tuned, and could result in vastly decreased performance.
Is this right? I thought that *less* than 3 data discs was the point at which the (non-)readahead performance penalty started kicking in.
I've explicitly got a 3d+1p volume on which performance is certainly an issue, and I made it that size for that reason (it could have been 1d+1p as far as capacity was concerned). Did I do the wrong thing?
Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.
I think the more important part is not where there is a "sweet spot" in the performance curve, if any, but rather that ultimately the disk subsystem does limit your throughput. I'm not aware of any recent data on thus, but my experience shows that you are limited to 100-200 ops per disk (it depends on your actual ops mix). You can see some old data on this in the white paper:
http://www.netapp.com/tech_library/3008.html
So if you are having performance issues with a 4 drive volume, it doesn't matter if the "sweet spot" is 3 drives or 5 drives; you can should still get better performance by adding another drive!
Bruce
options nosnap 1 # to disable periodic snapshot creation, for reproducibility
options nosnapdir 1 # to avoid inserting .snapshot entry when reading directories
...
1a. Regarding snapshots, many of my development sandboxes are "disposable" so I don't care about snapshots. The first two options might help me. In addition, if I set `snap reserve volX 0`, will that actually turn off snapshots, and reduce overhead on the filer? The options only seem to turn off *automatic* snapshots and the *display* of the ".snapshot" directory.
"options nosnap 1" prevents the filer from doing automatic snapshots, which as the comment says is for reproducibility -- taking a snapshot does chew up some resources, so if one happened during a benchmark run you'd have a blip that might not show up in another run. In any sort of benchmarking, you want to either eliminate any non-deterministic stimuli or use a sample which is large enough to average them out.
"options nosnapdir 1" simply keeps READDIR from returning .snapshot as one of the directory entries. The SFS benchmark initialization writes a bunch of files, then checks to make sure that what it wrote there -- and nothing else -- is present. This simply keeps that initialization check from failing.
options minra 1 # to minimize file read-ahead
...
- What does "options minra 1" buy me? What is the number when
minra=off, the default? (If it's just a boolean, nevermind!) Given I have a development environment with lots of small files, it seems as if turning this option on will benefit me a lot. However, some more information on this would help.
For Data ONTAP 5.0 and later, the default behavior when a read request causes a block to be read from disk is to read that block and the next 71 blocks, a total of 72 blocks (288 KB). "options minra 1" causes the filer to just read the block which is required.
If you do lots of sequential reading of big files, this is usually a good thing because it can amortize the cost of the seek and I/O to the disk over multiple blocks.
If you do lots of short, random reads (think database access) then the read-ahead may be really, really bad because you waste a lot of effort and memory on blocks you don't use.
If you have lots of small files, it doesn't matter, because the filer can't read blocks which don't exist.
SFS often reads short stretches of large files, and has a huge working set (bigger than memory), which is why enable this option for the benchmark.
options udp_lg_dgram.xmit_cksum.offload 1 # to offload checksum computations onto the Gigabit NIC
^^^^^^^ ...
- Does the udp_lg_dgram.xmit_cksum.offload option also apply to
quad-ethernet cards? Will it help if the ports are trunked with a virtual interface?
It's for Gigabit NICs only.
-- Karl Swartz Network Appliance Engineering Work: kls@netapp.com http://www.netapp.com/ Home: kls@chicago.com http://www.chicago.com/~kls/