----- Original Message ----- From: "Todd C. Merrill" tmerrill@mathworks.com To: toasters@mathworks.com Sent: Tuesday, November 07, 2000 8:27 AM Subject: Re: 250K NFS ops/sec, eh? (warning...long and ranting)
NetApp and EMC have to stop staging their own arms race with these SPEC numbers. The Celerra was designed to hold 14 datamovers (N+1 failover) and it is fair to test it as such. Filers were designed to be clustered in pairs, and it is fair to test them as such. It is not fair, in my opinion, to test "clusters" by stringing N of each of them together. The limit on scaling in such scenarios then becomes effectively, infinite, which is meaningless.
I disagree. The performance numbers have nothing to do with failover functionality, so there's no reason to limit "one configuration" to one that has failover. That logic would rule out a lot of single-server configurations entirely.
By the same token, that fact you can stick 14 datamovers in one big box is also irrelevant. What matters is what performance you can actually get out of a given configuration, not what you can fit in one box.
I agree, though, that it becomes silly since one can effectively keep adding more and more servers and scale upward. So how does one *really* compare? What you need to do is look "under the numbers" and compare how many ops each one does with a given amount of hardware. And then figure out $. The SFS97 test doesn't give this, but I hope in the future it *will* figure ops/$ (at list) and ops/disk and so on.
EMC gained over 100,000 ops/s with their Celerra the hard way: NFS v3 over TCP. NetApp gained their numbers the easy way: NFS v2 over UDP. My plea to NetApp is: please publish NFS v3 over TCP numbers, so we customers can make a fair comparison. You used to do this with the F760s. Or, conversely, to EMC: please publish NFS v2 over UDP. Then we can see who has the bigger di...<nevermind>. [2]
I agree that full data needs to be made available on both, and I'm sure we'll get there. However, knowing NTAP's "scaling factor" between v2 and v3 and UDP and TCP in the past, you can make a reasonable guess at NTAP's performance.
Anyone who takes these benchmark numbers at face value is a fool and deserves to have their money taken away from them by either vendor, for not doing their homework.
My challenge to BOTH vendors is: in addition to "maximum throughput" configurations and numbers, please publish SPEC numbers in REAL-LIFE configurations, configurations that customers actually use. [2,3] (Note the plurality.)
The F760 configuration is certainly a real life configuration. The Celerra configuration, however, is not. The monster would cost $20M list; even with heavy discounts you're talking $5-$10M, for a configuration without RAID protection, with tons of "wasted" disk space.
In the meantime, stop the foolishness, boys.
[unreferenced footnote] Sorry, folks, for the rant, but this really pushes my buttons. As some of you may know, The Mathworks now has NetApp filers *and* an EMC Celerra/Symmetrix. We went through all this number bullshit for months with both vendors [1], so I hate to see this foolishness again. When you dive into the numbers, both vendors' CPU units (filer heads or datamovers) are "comparable." Sometimes one is a bit ahead of the other, sometimes vice versa. But, they are approximately the same when you are able to sift through the numbers and compare apples to apples.
Umm, I think you've been mislead by EMC then. The do *not* have the same number of CPUs. You see, the Celerra DMs all have one CPU each, but the configuration quoted by EMC uses 6 Symmetrix Model 8430 frames, and each Symmetrix has 6 Channel Directors, and each Channel Director has 2 CPUs. These Channel Directors move and cache disk blocks the same way NTAP's CPU does. They are *not* SCSI cards... EMC has Disk/Storage Directors for that.
Anyway, if number of heads still bothers you, just remember this was the F760. With the F840, the number of heads needed would be reduced significantly.
If there were a clear winner in the strict ops/s game, then everyone would buy from that vendor if all they needed were ops/s. Luckily for us customers, there is healthy competition, which gives us what we need: better performance year after year. And, one more thing: good buying decisions are rarely as one-sided as choosing one vendor for one specification. Look at the whole picture: performance, scalability, reliability, ease of use, service, in-house knowledge/experience, etc.
I agree, but Netapp *is* the clear winnder in the strict ops/s game. The reason everyone doesn't buy Netapp is the other reasons you mention.
[1] With the numbers published so far, for F760s, for instance, we can see there is approximately and conservatively a 45% scaling factor between NetApp's NFS v2 over UDP versus their NFS v3 over TCP numbers. Assuming that ratio approximately translates to the F840s (ONTAP and WAFL are the same for 5.3.x, for instance), this magic 16-node filer cluster has 250,000 ops/s NFS v2 over TCP, which I figure is about 112,500 ops/s NFS v3 over TCP. That's about the same as EMC's Celerra with 14 datamovers. 14 datamovers or 16 filer heads...about the same performance within 10-15%.
Again, EMC uses a lot more processors than just 14; it uses 84. Plus it's not using RAID, uses a ton of more disk, etc. And the F840 would take even less CPUs.
[2] And, to preempt the inevitable questions, yes, the Celerra is not running in mirrored mode like most people would, and yes, it has more than one Symmetrix behind it. And, yes, the NetApp disables snapshots, and, yes, the filer has checksum blocks off, and yes, they minimize read-ahead (default values are all the opposite). The devil is in the footnotes...
Then why did you claim they were about the same, when the details show they aren't? The NTAP stuff is minor and only accounts for a few % overall and puts NTAP on equal footing with the competition; the EMC stuff is a bunch of extra hardware that *you* have to pay for and which shows their configuration is far less efficient.
[3] EMC: How about a mirrored configuration on one Symmetrix with, say, 8 datamovers, one being an active failover? NetApp: How about an out-of-the-box default clustered pair configuration? To both: How about NFS v3 over TCP (hard) and NFS v2 over UDP (easy), to see the *range* of your respective boxes?
Netapp will eventually produce all the different protocol numbers, I'm sure. "Out of the box" configurations I would think are unlikely, since it would mean spending a lot of testing resources on them for very little reward, and could even be confusing. EMC would certainly always quote the numbers that made NTAP look the worst, and not every customer is a savvy as you are to look "under the hood". (And even your looking under the hood seems to have left you with a faulty notion of EMC relative to NTAP.)
Bruce