On Wed 23 Feb, 2000, "Bruce Sterling Woodcock" sirbruce@ix.netcom.com wrote:
The article doesn't say, but bigger, faster disk drives often draw more power. Thus, NetApp needs to have shelves that can support them first, before they can test them for reliability. There are a lot of issues involving mechanical vibration and so on that have to be assessed before such a drive could be qualified. It may be that this particular drive is not appropriate for certain shelves.
Well, yeah, so much so obvious. So how soon was that? 8)
More spindles is better performance, if you are read-heavy. If your filer seems slow and it's reading constantly but the CPU is still under 100%, more spindles could help.
So much for conventional wisdom. However, my question was trying to elicit something a little more, well, complete as a model of performance.
I mean FCAL is a loop technology, and filers can come with 1, 2 or 3 loops. How much does performance differ when using 1 chain as opposed to 3 balanced chains, using which drives, using what RAID set sizes and volume mappings? How does performance get affected by the size of the disks you use?
See, if none of these things makes any difference to performance I'd love to know why. If they *do* then I'd love to know how to build faster filers.
I'd be happy with an analytical model, or a set of tables taken from either real lab testing with real filers, or simulator results. As it stands though, all we have are rules of thumb that sound plausible.
The same goes for backup performance. There's been so much discussion on the list about both topics, but I can't honestly hand-on-heart tell someone how best to set-up their filers. I'd write something if I knew the answers myself.
The best I can find is things like: http://now.netapp.com/NOW/knowledge/contents/FAQ/FAQ_560.shtml
I've been meaning to brush up on queueing theory so maybe I *will* try to work something up myself on this. I'd just hate to reinvent the wheel if someone's done it already. I also don't know anything about the buffers in the filer, on FCAL interfaces, the network interfaces, the precise use of the WAFL, etc.. So my model might be under-informed.
Here's a thought to mull over: if you have 20 18 gig drives on 2 loops and 16MB nvram, will you see better or worse performance than with 40 9GB drives on 1 loop with 16MB nvram presuming your clients are all writing as fast as they are allowed? Please disregard any "such configs aren't shipped" objections as this is an in-principle hypothetical. Please do consider the effect of RAID sets, and incremental disk-additions over the lifetime of the filer in question. I picked writing as the task because I wanted to point out also that there's an impact on the DOT algorithms, based on the spindles, the RAID set sizes, and maybe the chains too.
I don't know the answer, but I'd like to. Without such models we all have to purchase filers somewhat in the dark, and either pray that we got it right, or expend more time than we could in evaluating our purchases for suitability.
One question - is anyone else interested in this or am I just shooting my mouth off needlessly?
-- End of excerpt from "Bruce Sterling Woodcock"
----- Original Message ----- From: mark mds@gbnet.net To: toasters@mathworks.com Sent: Wednesday, February 23, 2000 5:04 PM Subject: Re: Disk drive technology
On Wed 23 Feb, 2000, "Bruce Sterling Woodcock" sirbruce@ix.netcom.com
wrote:
The article doesn't say, but bigger, faster disk drives often draw more power. Thus, NetApp needs to have shelves that can support them first, before they can test them for reliability. There are a lot of
issues
involving mechanical vibration and so on that have to be assessed before such a drive could be qualified. It may be that this particular drive is not appropriate for certain shelves.
Well, yeah, so much so obvious. So how soon was that? 8)
Probably not until late this year or next.
More spindles is better performance, if you are read-heavy. If your
filer
seems slow and it's reading constantly but the CPU is still under 100%, more spindles could help.
So much for conventional wisdom. However, my question was trying to elicit something a little more, well, complete as a model of performance.
The complete model depends entirely on your filer model, the OS version, exactly what disks you have, how much memory, what your op mix is, etc.
I mean FCAL is a loop technology, and filers can come with 1, 2 or 3 loops. How much does performance differ when using 1 chain as opposed to 3 balanced chains, using which drives, using what RAID set sizes and volume mappings? How does performance get affected by the size of the disks you use?
See, if none of these things makes any difference to performance I'd love to know why. If they *do* then I'd love to know how to build faster filers.
The make very little difference. On the order of 10% for most people. As to WHY, it's because Netapp finely balances their filers for maximum performance at typical op mixes. They are not like a competitor, which throws a bunch of hardware at a problem and you are expected to juggle it to figure out what works. Netapp wants to deliver a simple solution.
There are some performance papers on Netapp's web site that address some of your questions, and people are free to post their personal experiences here, but what works for one environment may not work for another. I would be reluctant to tell you to go out and buy more drives you don't need if it won't help you.
I'd be happy with an analytical model, or a set of tables taken from either real lab testing with real filers, or simulator results. As it stands though, all we have are rules of thumb that sound plausible.
Given that those tables constantly change with the variables (drives, memory, OS, etc.) rules of thumb is probably the best you can do, unless you have a few million to spend on testing every possible configuration and reporting the results. :)
The same goes for backup performance. There's been so much discussion on the list about both topics, but I can't honestly hand-on-heart tell someone how best to set-up their filers. I'd write something if I knew the answers myself.
There is no best. There is only what is best for your environment.
I've been meaning to brush up on queueing theory so maybe I *will* try to work something up myself on this. I'd just hate to reinvent the wheel if someone's done it already. I also don't know anything about the buffers in the filer, on FCAL interfaces, the network interfaces, the precise use of the WAFL, etc.. So my model might be under-informed.
Here's a thought to mull over: if you have 20 18 gig drives on 2 loops and 16MB nvram, will you see better or worse performance than with 40 9GB drives on 1 loop with 16MB nvram presuming your clients are all writing as fast as they are allowed?
I would guess slightly better, since the 18 gig drives are probably faster and you have 2 loops to share the load. However, since the bottleneck will be the NVRAM, I doubt there will be a major noticeable difference.
Please disregard any "such configs aren't shipped" objections as this is an in-principle hypothetical. Please do consider the effect of RAID sets, and incremental disk-additions over the lifetime of the filer in question.
I can't consider them if you don't say what they are.
I picked writing as the task because I wanted to point out also that there's an impact on the DOT algorithms, based on the spindles, the RAID set sizes, and maybe the chains too.
Agreed.
I don't know the answer, but I'd like to. Without such models we all have to purchase filers somewhat in the dark, and either pray that we got it right, or expend more time than we could in evaluating our purchases for suitability.
You can't know if you got it right unless you have an exact simulation of your particular ops mix and traffic patterns. Since you don't have that, then basically you have to use rules of thumb and guess and adjust when needed. Your environment changes over time, too, so what works one day may not work another.
One question - is anyone else interested in this or am I just shooting my mouth off needlessly?
Maybe it's just me, but while I love having numbers and data, I've always found such tuning tasks to be far more intuitive and situational. I mean, if I have a chart that says such-and-such filer with such-and-such configuration can handle 200 users, and the filer in my environment overloads at 100 users, I'm not gonna keep adding users. Conversely, if it runs at 50% and high cache age with 200 users, I'm not going to worry about adding more.
If a filer is overloaded with reads, and I haven't maxxed the RAM yet, I'm probably going to max the RAM first. Then I'll worry about adding more disks, and if that doesn't help, I'll know I need to reduce traffic and get another filer. There are some minor variations to this, of course, but I'm not going to waste a lot of time beforehand trying to predict exactly how many disks and how much memory I need when the reality can quickly change. Estimate, yes, but I won't follow a strict chart.
Bruce
I just had some rambling thoughts come to me, that I thought I'd share.
My immediate reaction when faced with questions like those in this thread --- how many drives, how many busses, how much cache, how do these numbers change with drive type, how do all of the above change with I/O mix --- is to look for models, but only to guide rules of thumb. I expect to learn from theoretical models something like "drive type and number and bus type and number, affect sustainable aggregate transaction rates; cache needs to be sized to aggregate transaction rate, and if the load is allowed to approach the capacity of the drive subsystem, the needed cache size to sustain optimum performance can grow exponentially". (I just made that up, but it sounds good:-).
Given a headful of such models and deduced rules of thumb, I then set up test points and want to benchmark to see how things really look. Models are great for getting a feel for whether increasing something should help, and whether here should be a point beyond which it stops helping, but for quantitative answers I really don't care so much about them; there are just too many variables.
Benchmarking is a place where NetApp could _really_ help us out. There may be some software limitation that makes this partly impractical, but it seems to me with just a few max-config servers and some load-generating software and systems, you could easily provide a performance-vs-cost curve for a given customer, for machines perfectly tuned to deliver the best price/performance for their particular needs. The two keys bits would be flexible servers, where for benchmarking you could selectively use only some of the drives or busses (easy) and only part of the cache (hard?), and flexible load-generating software, ideally driven from stats that are easily collected by a capture program run on the client's current production net for say a few weeks.
-Bennett
I mentioned before that Netapp *does* have some white papers on this, although they don't have charts for every single model they have. I recommend looking at:
http://www.netapp.com/tech_library/3008.html http://www.netapp.com/tech_library/3027.html http://www.netapp.com/tech_library/3066.html
Bruce
On Thu, 24 Feb 2000, Bennett Todd wrote:
Benchmarking is a place where NetApp could _really_ help us out. There may be some software limitation that makes this partly impractical, but it seems to me with just a few max-config servers and some load-generating software and systems, you could easily provide a performance-vs-cost curve for a given customer, for machines perfectly tuned to deliver the best price/performance
Look at the published SPEC results at:
http://www.spec.org/osg/sfs97/results/
then do some math with price quotes from NetApp or your VAR. As with most vendors, NetApp is no exception in charging a premium for the top-of- the-line model, the F760. As I've mentioned in another post, I went through this analysis, and since I needed gobs of ops, I bought two F740's instead of a single F760. For 100% more money, I got 100% more ops with a second F740, not 50% more ops with a single F760.
for their particular needs. The two keys bits would be flexible servers, where for benchmarking you could selectively use only some
I'm all for that. As a previous post again showed, I need more cache in my environment (amount of NVRAM seems sufficient). The F740 has 1/2 GB, no more, no less. A write-intensive development environment is apparently not the optimally-tuned "typical" configuration that NetApp uses to configure their one-size-fits-all filers.
Perhaps in the future NetApp could be a bit more flexible in the configurations available, and maybe offer "professional services" to monitor and tune a filer for a specific application.
Until next time...
Todd C. Merrill The Mathworks, Inc. 508-647-7000 x7792 3 Apple Hill Drive, Natick, MA 01760-2098 508-647-7001 FAX tmerrill@mathworks.com http://www.mathworks.com ---
On Fri, Mar 03, 2000 at 08:00:09AM -0500, Todd C. Merrill wrote:
I'm all for that. As a previous post again showed, I need more cache in my environment (amount of NVRAM seems sufficient). The F740 has 1/2 GB, no more, no less. A write-intensive development environment is apparently not the optimally-tuned "typical" configuration that NetApp uses to configure their one-size-fits-all filers.
Perhaps in the future NetApp could be a bit more flexible in the configurations available, and maybe offer "professional services" to monitor and tune a filer for a specific application.
on the other hand, i can appreciate what they were trying to do with the F700 series. you slap those banks full of ram and ship it. this way you avoid someone in say idaho ( random state, no offense if anyone actually is reading this from idaho ) buying a 760 with 128M of RAM and wondering why he is getting such terrible numbers when the advertised figures were so much higher. it eliminates that as an issue when dealing with phone support. so while not perfect for everyone, it does at least make some kind of sense.
-s
On Thu, 24 Feb 2000, mark wrote:
I've been meaning to brush up on queueing theory so maybe I *will* try to work something up myself on this. I'd just hate to reinvent the wheel if someone's done it already. I also don't know anything about the buffers in the filer, on FCAL interfaces, the network interfaces, the precise use of the WAFL, etc.. So my model might be under-informed.
In a similar vein, I find that my raid-fu is not has good as I'd like it to be. Anyone know of any resources (on/offline) that goes into a fair amount of detail on RAID, RAID sets, configuring for performance, and/or whatever else I should know but may not?
--noah
"information warfare is a growth industry" - David Loundy