I just had some rambling thoughts come to me, that I thought I'd share.
My immediate reaction when faced with questions like those in this thread --- how many drives, how many busses, how much cache, how do these numbers change with drive type, how do all of the above change with I/O mix --- is to look for models, but only to guide rules of thumb. I expect to learn from theoretical models something like "drive type and number and bus type and number, affect sustainable aggregate transaction rates; cache needs to be sized to aggregate transaction rate, and if the load is allowed to approach the capacity of the drive subsystem, the needed cache size to sustain optimum performance can grow exponentially". (I just made that up, but it sounds good:-).
Given a headful of such models and deduced rules of thumb, I then set up test points and want to benchmark to see how things really look. Models are great for getting a feel for whether increasing something should help, and whether here should be a point beyond which it stops helping, but for quantitative answers I really don't care so much about them; there are just too many variables.
Benchmarking is a place where NetApp could _really_ help us out. There may be some software limitation that makes this partly impractical, but it seems to me with just a few max-config servers and some load-generating software and systems, you could easily provide a performance-vs-cost curve for a given customer, for machines perfectly tuned to deliver the best price/performance for their particular needs. The two keys bits would be flexible servers, where for benchmarking you could selectively use only some of the drives or busses (easy) and only part of the cache (hard?), and flexible load-generating software, ideally driven from stats that are easily collected by a capture program run on the client's current production net for say a few weeks.
-Bennett