On Wed 23 Feb, 2000, "Mohler, Jeff" jeff.mohler@netapp.com wrote:
I dont have any solid data on data/stripe/drivesize performance handy, but it is a wise move to not span raid-groups over multiple FCAL controllers. There will be write degredation involved with that that I have seen. With two F760s rebuilding a manually failed drive while doing nothing else, the machine without a stripe split across FCAL cards completed about 25% before the machine that did span FCAL cards for a raid stripe.
mkfile tests from Solaris confirmed what Id heard..and saw in the reconstruct as part of my test.
This strikes me as very interesting. Until now I'd have reasoned that having two times the bandwidth to the spindles, and the lower latency that implies as well, (having only a proportion of io's in flight on each channel getting in the way of subsequent io's), would have benefitted RAID rebuilding - which is the nastiest case where all stripes must be read, XOR'd-over and blocks written to one disk.
Can anyone explain why this would be so?
-- End of excerpt from "Mohler, Jeff"
On Wed 23 Feb, 2000, "Bruce Sterling Woodcock" sirbruce@ix.netcom.com wrote:
Probably not until late this year or next.
Fair enough. I imagine they might appear with new product, as it's been a while.. *speculates wildly*
The complete model depends entirely on your filer model, the OS version, exactly what disks you have, how much memory, what your op mix is, etc.
Well, yes, but that's not impossible to plug into an analytical model.
I'd also hope that these low-level functions were pretty finely tuned in DOT and not as amenable to radical change..
The make very little difference. On the order of 10% for most people. As to WHY, it's because Netapp finely balances their filers for maximum performance at typical op mixes. They are not like a competitor, which throws a bunch of hardware at a problem and you are expected to juggle it to figure out what works. Netapp wants to deliver a simple solution.
Yeees. However, there's a touch of faith creeping in here. I don't think it's misplaced, certainly, I think NetApp's are my favourite fileservers for these reasons. I guess I'm just getting more cycnical and evidence-biased as I get older. 8)
There are some performance papers on Netapp's web site that address some of your questions, and people are free to post their personal experiences here, but what works for one environment may not work for another. I would be reluctant to tell you to go out and buy more drives you don't need if it won't help you.
Indeed, me too. But how can we quantitatively be confident ahead of time?
Given that those tables constantly change with the variables (drives, memory, OS, etc.) rules of thumb is probably the best you can do, unless you have a few million to spend on testing every possible configuration and reporting the results. :)
Well I know only one place and one group of people that could undertake this, and they sell the boxes. I'd be very susprised if the creators, the engineers, haven't got *something* going on in this line.
There is no best. There is only what is best for your environment.
Yes, I'd like to think my 'best' would be considered as predicated on the requirements. I'm not an absolutist, or at least I try not to be.
I would guess slightly better, since the 18 gig drives are probably faster and you have 2 loops to share the load. However, since the bottleneck will be the NVRAM, I doubt there will be a major noticeable difference.
Ah, I was thinking more about this - the nvram emptying after a cp takes time and that is a key determinant in whether you'll be taking a cp from a cp or not, in turn being a key determinant in user-visible performance.
This nvram emptying is surely determined in turn by how rapidly the blocks can be pushed to disk, in which case the speed of the disks and the utility of multiple loops may have some impact, surely? Or is the WAFL so good at distributing writes, and the bandwidth and latency of a single loop such a good match for the nvram that this is a null-term?
I can't consider them if you don't say what they are.
I meant, consider whether they're a big factor or not. I was thinking about the two scenarios using the same 10-spindle RAID set size.
In the average case the DOT algorithm has to read a few blocks from each of a few disks from a given RAID set, perform the parity XOR over those blocks and the blocks it intends to write, generate the parity, and then write all those new blocks to the disks that it has elected to write to during a cp flush, always including the parity disk. Having fewer spindles means more load per spindle, having fewer loops means more load per loop. How much does the cpu have to do, how much can be DMA'd or otherwise fobbed-off. How little time can you whittle the nvram empty down to?
I mean, to be reliable you have to await the completion notice from each disk for each io sent, and I don't believe 18GB disks to be capable of 2x the io's of 9GB disks. Serialising 4 RAID sets on one loop though, even with more disk-ios/second, will probably increase the final-response time to do a full nvram flush by a goodly chunk over the scenario with 2 loops.
Given a pathalogical filer setup I could see that increasing the nvram on a write-burdened filer might actually make the problem worse rather than better. How does an admin, especially one not so very interested and aware of these things, approach a filer that is struggling but they don't know why or how to fix it?
You can't know if you got it right unless you have an exact simulation of your particular ops mix and traffic patterns. Since you don't have that, then basically you have to use rules of thumb and guess and adjust when needed. Your environment changes over time, too, so what works one day may not work another.
Mmm, all of which makes me lean toward an analytical model, as it should be more powerful in dealing with these imponderables: plug in the numbers and see how it comes out. Real testing and simulation would make useful checks against such a model, natch.
Maybe it's just me, but while I love having numbers and data, I've always found such tuning tasks to be far more intuitive and situational.
Well, yes, I guess this is because we're human and quite good at dealing with vague and wooly things that nevertheless have patterns that can be extracted with experience.
I mean, if I have a chart that says such-and-such filer with such-and-such configuration can handle 200 users, and the filer in my environment overloads at 100 users, I'm not gonna keep adding users. Conversely, if it runs at 50% and high cache age with 200 users, I'm not going to worry about adding more.
True. Computers aren't as deterministic as they once were and we have to deal with what's really there, however I think there is scope for better models than none-at-all, or 'ask Bruce or Mark, they're good with filers'.
I also wouldn't advocate *such* a simple model, at least not alone.
If a filer is overloaded with reads, and I haven't maxxed the RAM yet, I'm probably going to max the RAM first. Then I'll worry about adding more disks, and if that doesn't help, I'll know I need to reduce traffic and get another filer. There are some minor variations to this, of course, but I'm not going to waste a lot of time beforehand trying to predict exactly how many disks and how much memory I need when the reality can quickly change. Estimate, yes, but I won't follow a strict chart.
Okey, so perhaps I've overemphasised the model and chart thing, or perhaps you're playing devil's advocate to my credulous believer. As a tool in the salesman's, field engineer's and admin's box'o'tricks though, a configurator other than than the SPEC result-sheets would be a very neat thing to have.
And as for wasting a lot of time beforehand - that's rather my point. You wouldn't have to waste time, or a lot of it.
So long as the results of a model were within a few tens of percent then your putative admin has gotten close enough to start tuning, rather than transhipping for the larger machine they thought they might need but chickened out of. Better, they find they're peaking just below the capability of the box they ordered and didn't cough up for the big-expensive beast they'd otherwise have bought "just in case".
-- End of excerpt from "Bruce Sterling Woodcock"
You can't know if you got it right unless you have an exact simulation of your particular ops mix and traffic patterns. Since you don't have that, then basically you have to use rules of thumb and guess and adjust when needed. Your environment changes over time, too, so what works one day may not work another.
Mmm, all of which makes me lean toward an analytical model, as it should be more powerful in dealing with these imponderables: plug in the numbers and see how it comes out. Real testing and simulation would make useful checks against such a model, natch.
If you can excuse my two cents worth - while I admire your quest for the perfect analytical model in terms of a raid system, it reminds me of the argument of "which is the best operating system". The answer is "for what?" To me there seems too many variables to put into an equation. For instance, most people deal with files of all differing sizes - so do you equate for mean, average, or extreme file sizes? Also, there is the issue of differing file systems and their performance issues - NFS (all versions), CIFS, etc.. Also, what about the question of how many concurrent users? Again, means or extremes? Chances are they could be using different file systems and file sizes too. And there are more factors to throw into the mix...
I would suggest the better question would be "what is the wrong way to configure a filer" so you can have the rules to eliminate what hinders efficiency of a filer in all situations and then fine tune by what you think are the relevant variables for your environment.
Of course, that said it would be great if you did find the unified theory of filers ;)
----------- Jay Orr Systems Administrator Fujitsu Nexion Inc. St. Louis, MO