I'm sorry Keith. But I've heard this argument before. NFS versus Direct attatch storage? I'm going to have to vote for a good direct attach solution. Why?
NFS and CIFS have huge overhead.
Also, I took a look at the link you provided. The information was incomplete at best
snip:
.The following results were obtained with Bluecurve's Dynameasure SQL Professional product. We were running MS SQL Server 6.5 on a Dell 4200 with two 200 Mhz Pentium Pro CPUs and 512 Mb of RAM. The local disk was on a Dell PowerEdge RAID adapter with 6 9Gb UltraSCSI disks. The filer was an F630 with 512 Mb of RAM, and 14 9Gb UltraSCSI disks. The Dell was running RAID 0. The Filer, needless to say, was running RAID 4.
snip
What were the speeds of the drives? The CPU information here is important but not compared to drive information. Also I'd like to point out that writing to 3 disks (you are using raid 0 so the 6 disks are now effectively 3) spindles is most likely going to be slower than writing to 14 spindles.
Following the logic of this link. (ignoring obvious disk information questions) Netapp would like me to believe that a machine running a non-parity raid platform (mirroring) and having a low protocol overhead (scsi) is slower that a Network appliance running a parity system (raid 4) and running NFS or CIFS (I'm assuming it was CIFS because it was a MS sequel server, but I could be incorrect).
I think I would like to see a test where the Disk sizes and number were similar, I sincerely doubt the Netapp would do as well.
I believe that Network Appliances are solid machines that perform their functions effectively, but they are NOT a replacement for direct attach where speed is essential. -----Original Message----- From: rdobbins@netmore.net [mailto:rdobbins@netmore.net] Sent: Tuesday, August 08, 2000 8:47 PM To: keith@netapp.com Cc: toasters@mathworks.com Subject: RE: Filer storage for databases, seriously? (Was: Re: NetApp ques tions)
No one thoughout recorded history, apart from assorted nuts, has ever believed that the Earth was flat; Eratosthenes calculated both its circumference to within 1% of the true value as well its tilt relative to the plane of the ecliptic no later than 200 B.C. The only real problem was that of terra incognita - i.e., nobody knew exactly where the landmasses were located until a) someone sailed there, and b) accurate chronometers were developed by John Harrison in the 18th Century, enabling navigators to calculate longitude with a high degree of accuracy, and then relay that information to cartographers.
Advances in spherical geometry a la Mercator assisted the latter group, of course.
Historical canards aside, let me restate that I'm very interested in hearing about production experience with NetApp filers and Oracle over NFS. I've a 740 with a Gigabit Ethernet interface, plugged into a Catalyst 5509 doing MPLS, and so would be willing to entertain the notion if someone can give me anything beyond benchmarks.
I know all about snapshots and all that, by the bye. It's -performance- which is the question.
Thanks for the pointer to the link, I'll be sure and check it out.
----------------------------------------------------------- Roland Dobbins rdobbins@netmore.net // 818.535.5024 voice
-----Original Message----- From: Keith Brown [mailto:keith@netapp.com] Sent: Tuesday, August 08, 2000 5:48 PM To: rdobbins@netmore.net; Perry.Jiang@bmo.com Cc: toasters@mathworks.com Subject: Filer storage for databases, seriously? (Was: Re: NetApp questions)
As to running Oracle with the data and logfiles on a filer via NFS, I should think that even with a NetApp using Gigabit Ethernet, you'd take a -huge- performance hit as compared to a local disk array.
Beware conventional wisdom Roland. People used to think the Earth was flat too. :-)
While I wouldn't be so bold as to *guarantee* performance boosts in utilizing the filer storage approach for every database application under the Sun, the simple fact is that filers contain a myriad of features that are very attractive to the database market, and NetApp now draws a significant and growing portion of its revenues from this space.
Snapshots & SnapRestore greatly simplify and enhance database backup and restore environs. The WAFL RAID design puts failure resiliancy into the disk subsystem without forcing you take the performance hits inherent in general purpose RAID-5 designs or going to disk-doubling RAID-1 approaches. SnapMirror gives you database replication to offsite locations for disaster recovery purposes. WAFL's ready expandibility lets you make room for growing databases without disrupting their operation. The list goes on...
Oh.. and yes... performance very often gets a shot in the arm too!
I've no empirical data to back this up, mind you;
Don't worry. Nobody ever does, not even our direct-attach competitors, not that they can be too harshly criticized. Meaningful performance comparisons are tricky to architect, usually have a short shelf life, and customers have an understandable tendency not to believe vendor funded benchmarks anyway (due to the fact that the vendor performing and/or funding the benchmark almost always wins!).
Nevertheless, we did publish a relatively innocuous one some time ago, which can be viewed here:
http://www.netapp.com/tech_library/3044.html
it's just that there's so much overhead associated with NFS even on an optimized platform like the NetApp filer, I can't see it as being a win.
There are certainly some "swings-and-roundabouts"-type things to consider when looking at the two approaches, and some people do conclude that there is more overhead in the network attach approach, dismissing it offhand. However, as far as performance goes, all the theory in the world is no substitute for the practical experience that could be gained by trying a solution for the application you have and actually bearing witness to how well it works and what it performs like.
If there's anyone out there with Oracle experience on filers via NFS, either pro or con, I'd love to hear from you.
I'm hoping there will be some on this list. As I mentioned, beware conventional wisdom. America might have been discovered hundreds of years before Columbus sailed over the horizon, if only all his ancestors hadn't been terrified of falling off the edge of the world!
Keith
On Wed, 9 Aug 2000, Pesce, Nicholas (FUSA) wrote:
I'm sorry Keith. But I've heard this argument before. NFS versus Direct attatch storage? I'm going to have to vote for a good direct attach solution. Why?
NFS and CIFS have huge overhead.
Sure, as someone else mentioned, all other things being equal, direct-attach storage is faster than network-attached storage. The logic (which I've argued before as well) is simple: a NAS box talks to its drives as DAS. Thus, the DAS must necessarily be "faster" (yeah, a vague term). For example, setting caching aside, it is not possible for a filer to pump 50MB/s of data to an NFS client if it can only read 30MB/s off its own drives.
However, all things are not equal, at least in benchmarks that are possible in the real world. Yes, NFS and CIFS add overhead compared to SCSI-over-FibreChannel or what have you. However, that is offset by an optimized OS (Data ONTAP), by an efficient filesystem (WAFL), by read and write caching, by an optimized TCP/IP stack, etc. If you could port all that and run it to DAS, then you might have a fair comparison.
I think I would like to see a test where the Disk sizes and number were similar, I sincerely doubt the Netapp would do as well.
Depends on the application, of course, but I've been surprised many times in the past when I thought for sure the Netapp would not be able to keep up. I have a 4x450-MHz E420R with a VxVM RAID-0 device, spread over 16 50GB 7200 rpm drives on two U2SCSI buses. The server also has a Gigabit Ethernet connection to an F740 with one shelf of 36GB 10000 rpm drives (5 data, 1 parity, 1 spare). The local filesystem is vxfs, mounted with delaylog and the largest allowable log area.
I ran a few filesystem replication and backup/restore tests (this is our central tape server). The local filesystem handily beat the Netapp doing large sequential reads and writes (120MB/sec vs. 22MB/sec)... no surprise there. File deletions were a little closer (~2500 unlinks/sec on vxfs, ~2000 unlinks/sec on the Netapp). In all other tests, the Netapp was as fast or faster (sometimes by a large margin) than local filesystem. The Netapp seems to especially shine when you have multiple processes reading and writing to all points on the filesystem. vxfs does not appear to handle it as gracefully with dozens or hundreds of concurrent access requests.
I re-ran some of the same tests with a Veritas RAID-5 volume (to be fair to the Netapp), but I stopped after the first couple. There is no contest at that point. Veritas software RAID-5 is dog-slow (I think I saw bursts of 8MB/sec sequential writes). Turn on a Veritas snapshot, and writes to the snapped filesystem go even further into the toilet. The performance degradation is cumulative with the number of snapshots. There is no such penalty on the Netapp.
One caveat I should mention, since it bit us in the past: file locking performance. We have one application that, when running on the same type of hardware as above (E420R with those drives), spews forth 150,000 syscalls per second, according to Solaris' "vmstat". 80% of those calls are fcntl() locks/unlocks to various database files on disk. Poor programming practice aside, this application runs very slowly over NFS. It simply cannot match in-kernel file locking when you're dealing with a local filesystem. Besides that one exceptional application, we run Netapps for everything else (including Oracle).
Brian, your comparison was very interesting to read. We're currently working on choosing a storage consolidation, and the main contenders are a Filer and a general-purpose fileserver with a direct-attached raid.
Depends on the application, of course, but I've been surprised
many times in the past when I thought for sure the Netapp would not be able to keep up. I have a 4x450-MHz E420R with a VxVM RAID-0 device, spread over 16 50GB 7200 rpm drives on two U2SCSI buses. The server also has a Gigabit Ethernet connection to an F740 with one shelf of 36GB 10000 rpm drives (5 data, 1 parity, 1 spare). The local filesystem is vxfs, mounted with delaylog and the largest allowable log area.
If I'm reading this correctly, you're using all software raid for the DAS side, correct? I would be very interested in seeing a similar comparison with a hardware raid controller, as this would offload all the raid overhead from the host cpus to a subsystem better optimized to handle it. This would also unfortunately introduce a lot more variables of the sort Alan Yoder pointed out, as I'm sure the performance under load of the various available raid controllers is far from uniform.
I re-ran some of the same tests with a Veritas RAID-5 volume (to
be fair to the Netapp), but I stopped after the first couple. There is no contest at that point. Veritas software RAID-5 is dog-slow (I think I saw bursts of 8MB/sec sequential writes). Turn on a Veritas snapshot, and writes to the snapped filesystem go even further into the toilet.
This is probably a product of the raid overhead all being handled by a relatively ponderous general-purpose computer, and the lack of a safe way to do write caching.
jm
On Fri, Aug 11, 2000 at 04:34:48PM -0500, Jim Moechnig wrote:
Depends on the application, of course, but I've been surprised
many times in the past when I thought for sure the Netapp would not be able to keep up. I have a 4x450-MHz E420R with a VxVM RAID-0 device, spread over 16 50GB 7200 rpm drives on two U2SCSI buses. The server also has a Gigabit Ethernet connection to an F740 with one shelf of 36GB 10000 rpm drives (5 data, 1 parity, 1 spare). The local filesystem is vxfs, mounted with delaylog and the largest allowable log area.
If I'm reading this correctly, you're using all software raid for the DAS side, correct? I would be very interested in seeing a similar comparison with a hardware raid controller, as this would offload all the raid overhead from the host cpus to a subsystem better optimized to handle it. This would also unfortunately introduce a lot more variables of the sort Alan Yoder pointed out, as I'm sure the performance under load of the various available raid controllers is far from uniform.
I don't have hard numbers, but when we moved our Oracle database from direct-attached A1000 (dedicated diff. SCSI controler) to a filer we saw a large join that used to take 3 hours drop to 20 minutes, and overall database performance increase perceptably. We're very happy with the Filer's ability to handle large numbers of simultaneous reads/writes. Your milage may vary if you're doing single-threaded reading or writing.
Certainly key is using GigE to connect the Filer to the server. Also, keep in mind that the more disks you put in the Oracle volume, the better your performance. We almost made the mistake of breaking our volumes up into 3-disk sets until our NetApp OEM told us what the performance impact would be.
Aaron Sherman wrote:
I don't have hard numbers, but when we moved our Oracle database from direct-attached A1000 (dedicated diff. SCSI controler) to a filer we saw a large join that used to take 3 hours drop to 20 minutes
I can believe that. The A1000s are really ropey things for a number of reasons. Firstly the sun differential scsi controller is a qlogic isp1000 which netapp stopped using (except for tape drives) about 3 years ago and was never the fastest card around. I was never convinced that Suns driver got the best out of the card at that. Secondly the raid controller is one OEMd from metastor (LSIlogic) which they themselves discontinued about 3 years ago using a Pentium 75 for the parity calculations. We find writes go about 5* faster onto a filer than they do onto A1000s/D1000s, reads are only around 2* faster though.