You're right about one thing... all things are not equal. You raised some good points along the way... see below for my annotations.
-----Original Message----- From: Brian Tao [mailto:taob@risc.org] Sent: Thursday, August 10, 2000 2:33 PM To: Pesce, Nicholas (FUSA) Cc: keith@netapp.com; toasters@mathworks.com Subject: RE: Filer storage for databases, seriously? (Was: Re: NetApp ques tions)
On Wed, 9 Aug 2000, Pesce, Nicholas (FUSA) wrote:
I'm sorry Keith. But I've heard this argument before. NFS versus Direct attatch storage? I'm going to have to vote for a good direct attach solution. Why?
NFS and CIFS have huge overhead.
Sure, as someone else mentioned, all other things being equal,
direct-attach storage is faster than network-attached storage. The logic (which I've argued before as well) is simple: a NAS box talks to its drives as DAS. Thus, the DAS must necessarily be "faster" (yeah, a vague term). For example, setting caching aside, it is not possible for a filer to pump 50MB/s of data to an NFS client if it can only read 30MB/s off its own drives.
One thing to remember in regards to caching... when you request a block on disk, your local OS (with direct attach storage) will read that block and read ahead several more blocks. Those read-ahead blocks will be transferred via your fibre channel, even if they aren't used by your application. The filer on the other hand will do the same read-ahead, but it will only transmit the block that was requested. If the application asks for the next block, then both have that block in memory. Now, in a database environment where blocks are very randomly placed within a filesystem, this read-ahead via direct attach storage is going to kill the efficiency of the bandwidth to your storage, and that's your OS's fault. Whereas with a filer, you can turn off read-ahead within the filer to minimize the work it does.
However, all things are not equal, at least in benchmarks that are
possible in the real world. Yes, NFS and CIFS add overhead compared to SCSI-over-FibreChannel or what have you. However, that is offset by an optimized OS (Data ONTAP), by an efficient filesystem (WAFL), by read and write caching, by an optimized TCP/IP stack, etc. If you could port all that and run it to DAS, then you might have a fair comparison.
I think I would like to see a test where the Disk sizes and number were similar, I sincerely doubt the Netapp would do as well.
Depends on the application, of course, but I've been surprised
many times in the past when I thought for sure the Netapp would not be able to keep up. I have a 4x450-MHz E420R with a VxVM RAID-0 device, spread over 16 50GB 7200 rpm drives on two U2SCSI buses. The server also has a Gigabit Ethernet connection to an F740 with one shelf of 36GB 10000 rpm drives (5 data, 1 parity, 1 spare). The local filesystem is vxfs, mounted with delaylog and the largest allowable log area.
I ran a few filesystem replication and backup/restore tests (this
is our central tape server). The local filesystem handily beat the Netapp doing large sequential reads and writes (120MB/sec vs. 22MB/sec)... no surprise there. File deletions were a little closer (~2500 unlinks/sec on vxfs, ~2000 unlinks/sec on the Netapp). In all other tests, the Netapp was as fast or faster (sometimes by a large margin) than local filesystem. The Netapp seems to especially shine when you have multiple processes reading and writing to all points on the filesystem. vxfs does not appear to handle it as gracefully with dozens or hundreds of concurrent access requests.
This is an apples to oranges test.
Sure, streaming to/from RAID0 will always kick ass. However, who really runs RAID0 these days??? (I'm sure there's about <.1% of applications where RAID0 is suitable, because the data is not critical) RAID 0+1 might be a slightly better comparison since the filer doesn't have a RAID0 mode.
I re-ran some of the same tests with a Veritas RAID-5 volume (to
be fair to the Netapp), but I stopped after the first couple. There is no contest at that point. Veritas software RAID-5 is dog-slow (I think I saw bursts of 8MB/sec sequential writes). Turn on a Veritas snapshot, and writes to the snapped filesystem go even further into the toilet. The performance degradation is cumulative with the number of snapshots. There is no such penalty on the Netapp.
Ok, this is apples to apples.
One caveat I should mention, since it bit us in the past: file
locking performance. We have one application that, when running on the same type of hardware as above (E420R with those drives), spews forth 150,000 syscalls per second, according to Solaris' "vmstat". 80% of those calls are fcntl() locks/unlocks to various database files on disk. Poor programming practice aside, this application runs very slowly over NFS. It simply cannot match in-kernel file locking when you're dealing with a local filesystem. Besides that one exceptional application, we run Netapps for everything else (including Oracle).
Ok, given the environment you just described, you could enable a undocumented feature within the solaris mount_nfs command, 'llock'. This tells the NFS client that he shouldn't use NLM to do file locking, I'll just do it locally. This will essentially give you in-kernel locking. The caveat here is that you can't share the filesystem with other clients, but you can't do that with today's direct attach storage either. I usually recommend using the llock option in a Solaris/Oracle/filer environment.
Aaron
On Thu, 10 Aug 2000, Sims, Aaron wrote:
This is an apples to oranges test.
Yep, but it is interesting that even in such an unbalanced comparison, the Netapp still manages to fare quite admirably.
Sure, streaming to/from RAID0 will always kick ass. However, who really runs RAID0 these days???
I lied when I said this biller application was the only thing we ran on local disk. Our news servers also run on local disk. In that situation, we're not concerned enough about reliability to spend the premium on Netapps. We just wanted fast, cheap multi-terabyte storage.
Ok, given the environment you just described, you could enable a undocumented feature within the solaris mount_nfs command, 'llock'.
Ahhhhhh... *that's* what it is! I had a vague memory that this was possible under Solaris, but never could find a reference to it. I'm re-running the biller database benchmark now, and I am seeing about 16% *better* performance than on the Veritas RAID-0 filesystem. Sir, you may just have sold another filer with that. I'll be sure to get my sales rep to take you out to the most expensive restaurant in Toronto, should you ever come up this way. ;-)
If you do a man on "nfsstat" from your solaris box, you will see that, although "llock" is not "documented" in the normal fashion, sun continues to show it is "available" as a mount option on the nfsstat man page.
Look in the section referring to mount flags and you will find:
llock Local locking being used (no lock manager).
--tmac
Brian Tao wrote:
On Thu, 10 Aug 2000, Sims, Aaron wrote:
This is an apples to oranges test.
Yep, but it is interesting that even in such an unbalanced
comparison, the Netapp still manages to fare quite admirably.
Sure, streaming to/from RAID0 will always kick ass. However, who really runs RAID0 these days???
I lied when I said this biller application was the only thing we
ran on local disk. Our news servers also run on local disk. In that situation, we're not concerned enough about reliability to spend the premium on Netapps. We just wanted fast, cheap multi-terabyte storage.
Ok, given the environment you just described, you could enable a undocumented feature within the solaris mount_nfs command, 'llock'.
Ahhhhhh... *that's* what it is! I had a vague memory that this
was possible under Solaris, but never could find a reference to it. I'm re-running the biller database benchmark now, and I am seeing about 16% *better* performance than on the Veritas RAID-0 filesystem. Sir, you may just have sold another filer with that. I'll be sure to get my sales rep to take you out to the most expensive restaurant in Toronto, should you ever come up this way. ;-) -- Brian Tao (BT300, taob@risc.org) "Though this be madness, yet there is method in't"
-- ******All New Numbers!!!****** ************* *************
Timothy A. McCarthy --> System Engineer, Eastern Region Network Appliance http://www.netapp.com 240-268-2034 Office \ / Page Me at: 240-268-2001 Fax / 888-971-4468
Sure, streaming to/from RAID0 will always kick ass. However, who really runs RAID0 these days??? (I'm sure there's about <.1% of applications where RAID0 is suitable, because the data is not critical) RAID 0+1 might be a slightly better comparison since the filer doesn't
have
a RAID0 mode.
Of course, with RAID 0+1, you're paying for twice as many disks. For the same money, you could buy a second filer, and spread the load, and get better performance for the same dollars. (And a better MTBF).
Bruce
+-- "Bruce Sterling Woodcock" sirbruce@ix.netcom.com once said: | Of course, with RAID 0+1, you're paying for twice as many disks. | For the same money, you could buy a second filer, and spread the | load, and get better performance for the same dollars. (And a better | MTBF).
Say huh? For the same money? Even having to buy twice the disks, you'd still most likely end up cheaper than a filer. If this isn't the case for you, I want in on that discount.
There's no way you'd get two filers with X diskspace between them for anywhere near the price of some cheaper DAS with X diskspace, unless that DAS you're talking about is an EMC or something like that.
Oz
----- Original Message ----- From: "Ozzie Sabina" ors@cimedia.com To: "Bruce Sterling Woodcock" sirbruce@ix.netcom.com; "Sims, Aaron" Aaron.Sims@netapp.com; toasters@mathworks.com Sent: Monday, August 14, 2000 5:06 PM Subject: Re: Filer storage for databases, seriously? (Was: Re: NetApp ques tions)
+-- "Bruce Sterling Woodcock" sirbruce@ix.netcom.com once said: | Of course, with RAID 0+1, you're paying for twice as many disks. | For the same money, you could buy a second filer, and spread the | load, and get better performance for the same dollars. (And a better | MTBF).
Say huh? For the same money? Even having to buy twice the disks, you'd still most likely end up cheaper than a filer. If this isn't the case for you, I want in on that discount.
Depends on the amount of disk you are talking. Note that I didn't say you could buy a second duplicate filer, just that you could get a second one. Do the math and you'll come out on top.
There's no way you'd get two filers with X diskspace between them for anywhere near the price of some cheaper DAS with X diskspace, unless that DAS you're talking about is an EMC or something like that.
That's because the "some cheaper DAS with X diskspace" would not have compareable performance and RAID protection. If you cost out your DAS of comparable performance to a filer, but with twice the disk (for RAID 0+1), it will be more expensive.
Bruce
+-- "Bruce Sterling Woodcock" sirbruce@ix.netcom.com once said: | Depends on the amount of disk you are talking. Note that I didn't | say you could buy a second duplicate filer, just that you could get a | second one. Do the math and you'll come out on top.
I agree, certainly if you're talking about 2TB or something like that, the cost of the additional disk gets very high. | | That's because the "some cheaper DAS with X diskspace" would | not have compareable performance and RAID protection. If you | cost out your DAS of comparable performance to a filer, but with | twice the disk (for RAID 0+1), it will be more expensive.
I'll argue that you get just fine "RAID protection" from 0+1 on any reasonable array. Performance is another matter, but I didn't think we were trying to compare apples-to-apples there. Performance and features are why you'd spend the extra money on a Netapp in the first place.
Oz
In message 20000815000631.18668.qmail@trail.cimedia.comyou write:
Say huh? For the same money? Even having to buy twice the disks, you'd still most likely end up cheaper than a filer. If this isn't the case for you, I want in on that discount.
..Depends a lot on who you're buying your disk from and what features you want, I think..
At my place of business, we were recently deciding between EMC and NetApp.
We wanted: Protected Data (RAID) Snapshots Off-line Backups Remote Mirroring
In EMC Land, where the only way to get more features is to buy more mirrors, we had: Copy 1 - The Original Data Copy 2 - A mirror of the data Copy 3 - A BCV of the data for rapid recovery Copy 4 - A BCV of the data for offline backups Copy 5 - A BCV for SRDF mirroring to a remote site
Copy 6 - A BCV for receiving SRDF from the primary site Copy 7 - The live data Copy 8 - A mirror of the data
That's 8 TBs of disk on the floor for 1 usable TB of storage.
There's no way you'd get two filers with X diskspace between them for anywhere near the price of some cheaper DAS with X diskspace, unless that DAS you're talking about is an EMC or something like that.
Okay, we were also deciding between putting up another NT fileserver with ~200 gigs usable and an F740 with ~200gigs usable.
The head unit for the F740 was about $10k more than a Compaq DL380, however the disk with shelves was about $1,000 _cheaper_ than the dumb shelves Compaq was trying to sell us.
So it's a tad more expensive, but the extra features of the NetApp pushed it over the top.
-James