I have been hearing this question from a lot of technically astute clients lately. While I don't have all the answers, I can suggest at least one reason why the protocol overhead inherent in NFS and CIFS is not as big a disadvantage in practice as it appears to be in theory. The main point to consider is that the time it takes to get the head on a disk drive positioned over the data to be read is much, much greater than the time it takes to transfer the data.
Lets take for example, the Seagate Barracuda 18 Gig drive. The average seek time is ~8 msec, and the average rotational latency is ~4 msec, so it will take on average 12 msec to position the head over the data of interest and start the operation. If the operation was to read in 4KB at a media-to-buffer rate of 20 MBytes/sec, it would be completed in ~0.2 msec, 2 orders of magnitude faster than the time it took to get into position. If your average system has a lot of processes, each doing I/O to different areas of the disk, you will spend on average 60x more time doing seeks than transferring data. Reducing this overhead clearly makes a much bigger difference to performance than increasing the bandwidth of the path from the host to the disk.
While the average times give you the feel, the edge times are very interesting as well. The seek time for the same drive varies from ~1.5 msec to move a single track, to ~15 msec to move across the platter. Thus, there is an order of magnitude improvement possible if the filesystem was optimized to locate related data together. Our file system, WAFL, does just that. Other filesystems which put their metadata in a particular area of the disk are guaranteed to cause a lot of mechanical seeks under normal load.
If you had just one single threaded application writing and reading large files to dedicated local storage, then the seek latency described above would not be a big issue. Most databases are not like that. In this environment the appropriate emphasis is to improve performance by reducing the effects of the mechanical latency involved in head seeks and rotational delays. That's what filers do, and that is one reason why they perform so well in spite of the overhead of NFS or CIFS.
Blaine Agnew
-----Original Message----- From: Pesce, Nicholas (FUSA) [mailto:Nicholaspesce@firstusa.com] Sent: Wednesday, August 09, 2000 11:34 AM To: 'rdobbins@netmore.net'; keith@netapp.com Cc: toasters@mathworks.com Subject: RE: Filer storage for databases, seriously? (Was: Re: NetApp ques tions)
I'm sorry Keith. But I've heard this argument before. NFS versus Direct attatch storage? I'm going to have to vote for a good direct attach solution. Why?
NFS and CIFS have huge overhead.
Also, I took a look at the link you provided. The information was incomplete at best
snip:
.The following results were obtained with Bluecurve's Dynameasure SQL Professional product. We were running MS SQL Server 6.5 on a Dell 4200 with two 200 Mhz Pentium Pro CPUs and 512 Mb of RAM. The local disk was on a Dell PowerEdge RAID adapter with 6 9Gb UltraSCSI disks. The filer was an F630 with 512 Mb of RAM, and 14 9Gb UltraSCSI disks. The Dell was running RAID 0. The Filer, needless to say, was running RAID 4.
snip
What were the speeds of the drives? The CPU information here is important but not compared to drive information. Also I'd like to point out that writing to 3 disks (you are using raid 0 so the 6 disks are now effectively 3) spindles is most likely going to be slower than writing to 14 spindles.
Following the logic of this link. (ignoring obvious disk information questions) Netapp would like me to believe that a machine running a non-parity raid platform (mirroring) and having a low protocol overhead (scsi) is slower that a Network appliance running a parity system (raid 4) and running NFS or CIFS (I'm assuming it was CIFS because it was a MS sequel server, but I could be incorrect).
I think I would like to see a test where the Disk sizes and number were similar, I sincerely doubt the Netapp would do as well.
I believe that Network Appliances are solid machines that perform their functions effectively, but they are NOT a replacement for direct attach where speed is essential. -----Original Message----- From: rdobbins@netmore.net [mailto:rdobbins@netmore.net] Sent: Tuesday, August 08, 2000 8:47 PM To: keith@netapp.com Cc: toasters@mathworks.com Subject: RE: Filer storage for databases, seriously? (Was: Re: NetApp ques tions)
No one thoughout recorded history, apart from assorted nuts, has ever believed that the Earth was flat; Eratosthenes calculated both its circumference to within 1% of the true value as well its tilt relative to the plane of the ecliptic no later than 200 B.C. The only real problem was that of terra incognita - i.e., nobody knew exactly where the landmasses were located until a) someone sailed there, and b) accurate chronometers were developed by John Harrison in the 18th Century, enabling navigators to calculate longitude with a high degree of accuracy, and then relay that information to cartographers.
Advances in spherical geometry a la Mercator assisted the latter group, of course.
Historical canards aside, let me restate that I'm very interested in hearing about production experience with NetApp filers and Oracle over NFS. I've a 740 with a Gigabit Ethernet interface, plugged into a Catalyst 5509 doing MPLS, and so would be willing to entertain the notion if someone can give me anything beyond benchmarks.
I know all about snapshots and all that, by the bye. It's -performance- which is the question.
Thanks for the pointer to the link, I'll be sure and check it out.
Roland Dobbins rdobbins@netmore.net // 818.535.5024 voice
-----Original Message----- From: Keith Brown [mailto:keith@netapp.com] Sent: Tuesday, August 08, 2000 5:48 PM To: rdobbins@netmore.net; Perry.Jiang@bmo.com Cc: toasters@mathworks.com Subject: Filer storage for databases, seriously? (Was: Re: NetApp questions)
As to running Oracle with the data and logfiles on a filer via NFS, I should think that even with a NetApp using Gigabit Ethernet, you'd take a -huge- performance hit as compared to a local disk array.
Beware conventional wisdom Roland. People used to think the Earth was flat too. :-)
While I wouldn't be so bold as to *guarantee* performance boosts in utilizing the filer storage approach for every database application under the Sun, the simple fact is that filers contain a myriad of features that are very attractive to the database market, and NetApp now draws a significant and growing portion of its revenues from this space.
Snapshots & SnapRestore greatly simplify and enhance database backup and restore environs. The WAFL RAID design puts failure resiliancy into the disk subsystem without forcing you take the performance hits inherent in general purpose RAID-5 designs or going to disk-doubling RAID-1 approaches. SnapMirror gives you database replication to offsite locations for disaster recovery purposes. WAFL's ready expandibility lets you make room for growing databases without disrupting their operation. The list goes on...
Oh.. and yes... performance very often gets a shot in the arm too!
I've no empirical data to back this up, mind you;
Don't worry. Nobody ever does, not even our direct-attach competitors, not that they can be too harshly criticized. Meaningful performance comparisons are tricky to architect, usually have a short shelf life, and customers have an understandable tendency not to believe vendor funded benchmarks anyway (due to the fact that the vendor performing and/or funding the benchmark almost always wins!).
Nevertheless, we did publish a relatively innocuous one some time ago, which can be viewed here:
http://www.netapp.com/tech_library/3044.html
it's just that there's so much overhead associated with NFS even on an optimized platform like the NetApp filer, I can't see it as being a win.
There are certainly some "swings-and-roundabouts"-type things to consider when looking at the two approaches, and some people do conclude that there is more overhead in the network attach approach, dismissing it offhand. However, as far as performance goes, all the theory in the world is no substitute for the practical experience that could be gained by trying a solution for the application you have and actually bearing witness to how well it works and what it performs like.
If there's anyone out there with Oracle experience on filers via NFS, either pro or con, I'd love to hear from you.
I'm hoping there will be some on this list. As I mentioned, beware conventional wisdom. America might have been discovered hundreds of years before Columbus sailed over the horizon, if only all his ancestors hadn't been terrified of falling off the edge of the world!
Keith
Yes, latency is a big factor, but wasn't the question about directly connected RAID (or a SAN) vs. a filer? Since both are raided, the latency would be roughly equal.
I think a better comparison would be the overhead of the raid systems. For example, how much latency is incurred to put the data into an NFS stream and then for it to travel over a network to it's destination, vs a SAN system pushing the info to a directly attached system (minus load incurred on systems)?
I dunno. There should be some way to quantify all this, shouldn't there?
In defense of Netapp solution, NFS connections are much cheaper then SAN connections, and rarely is a filer only dedicated to one system. I would hazard to say it is really dependent on what you're trying to do as to which is better....
On Wed, 9 Aug 2000, Agnew, Blaine wrote:
I have been hearing this question from a lot of technically astute clients lately. While I don't have all the answers, I can suggest at least one reason why the protocol overhead inherent in NFS and CIFS is not as big a disadvantage in practice as it appears to be in theory. The main point to consider is that the time it takes to get the head on a disk drive positioned over the data to be read is much, much greater than the time it takes to transfer the data.
Lets take for example, the Seagate Barracuda 18 Gig drive. The average seek time is ~8 msec, and the average rotational latency is ~4 msec, so it will take on average 12 msec to position the head over the data of interest and start the operation. If the operation was to read in 4KB at a media-to-buffer rate of 20 MBytes/sec, it would be completed in ~0.2 msec, 2 orders of magnitude faster than the time it took to get into position. If your average system has a lot of processes, each doing I/O to different areas of the disk, you will spend on average 60x more time doing seeks than transferring data. Reducing this overhead clearly makes a much bigger difference to performance than increasing the bandwidth of the path from the host to the disk.
While the average times give you the feel, the edge times are very interesting as well. The seek time for the same drive varies from ~1.5 msec to move a single track, to ~15 msec to move across the platter. Thus, there is an order of magnitude improvement possible if the filesystem was optimized to locate related data together. Our file system, WAFL, does just that. Other filesystems which put their metadata in a particular area of the disk are guaranteed to cause a lot of mechanical seeks under normal load.
If you had just one single threaded application writing and reading large files to dedicated local storage, then the seek latency described above would not be a big issue. Most databases are not like that. In this environment the appropriate emphasis is to improve performance by reducing the effects of the mechanical latency involved in head seeks and rotational delays. That's what filers do, and that is one reason why they perform so well in spite of the overhead of NFS or CIFS.
Blaine Agnew
-----Original Message----- From: Pesce, Nicholas (FUSA) [mailto:Nicholaspesce@firstusa.com] Sent: Wednesday, August 09, 2000 11:34 AM To: 'rdobbins@netmore.net'; keith@netapp.com Cc: toasters@mathworks.com Subject: RE: Filer storage for databases, seriously? (Was: Re: NetApp ques tions)
I'm sorry Keith. But I've heard this argument before. NFS versus Direct attatch storage? I'm going to have to vote for a good direct attach solution. Why?
NFS and CIFS have huge overhead.
Also, I took a look at the link you provided. The information was incomplete at best
snip:
.The following results were obtained with Bluecurve's Dynameasure SQL Professional product. We were running MS SQL Server 6.5 on a Dell 4200 with two 200 Mhz Pentium Pro CPUs and 512 Mb of RAM. The local disk was on a Dell PowerEdge RAID adapter with 6 9Gb UltraSCSI disks. The filer was an F630 with 512 Mb of RAM, and 14 9Gb UltraSCSI disks. The Dell was running RAID 0. The Filer, needless to say, was running RAID 4.
snip
What were the speeds of the drives? The CPU information here is important but not compared to drive information. Also I'd like to point out that writing to 3 disks (you are using raid 0 so the 6 disks are now effectively 3) spindles is most likely going to be slower than writing to 14 spindles.
Following the logic of this link. (ignoring obvious disk information questions) Netapp would like me to believe that a machine running a non-parity raid platform (mirroring) and having a low protocol overhead (scsi) is slower that a Network appliance running a parity system (raid 4) and running NFS or CIFS (I'm assuming it was CIFS because it was a MS sequel server, but I could be incorrect).
I think I would like to see a test where the Disk sizes and number were similar, I sincerely doubt the Netapp would do as well.
I believe that Network Appliances are solid machines that perform their functions effectively, but they are NOT a replacement for direct attach where speed is essential. -----Original Message----- From: rdobbins@netmore.net [mailto:rdobbins@netmore.net] Sent: Tuesday, August 08, 2000 8:47 PM To: keith@netapp.com Cc: toasters@mathworks.com Subject: RE: Filer storage for databases, seriously? (Was: Re: NetApp ques tions)
No one thoughout recorded history, apart from assorted nuts, has ever believed that the Earth was flat; Eratosthenes calculated both its circumference to within 1% of the true value as well its tilt relative to the plane of the ecliptic no later than 200 B.C. The only real problem was that of terra incognita - i.e., nobody knew exactly where the landmasses were located until a) someone sailed there, and b) accurate chronometers were developed by John Harrison in the 18th Century, enabling navigators to calculate longitude with a high degree of accuracy, and then relay that information to cartographers.
Advances in spherical geometry a la Mercator assisted the latter group, of course.
Historical canards aside, let me restate that I'm very interested in hearing about production experience with NetApp filers and Oracle over NFS. I've a 740 with a Gigabit Ethernet interface, plugged into a Catalyst 5509 doing MPLS, and so would be willing to entertain the notion if someone can give me anything beyond benchmarks.
I know all about snapshots and all that, by the bye. It's -performance- which is the question.
Thanks for the pointer to the link, I'll be sure and check it out.
Roland Dobbins rdobbins@netmore.net // 818.535.5024 voice
-----Original Message----- From: Keith Brown [mailto:keith@netapp.com] Sent: Tuesday, August 08, 2000 5:48 PM To: rdobbins@netmore.net; Perry.Jiang@bmo.com Cc: toasters@mathworks.com Subject: Filer storage for databases, seriously? (Was: Re: NetApp questions)
As to running Oracle with the data and logfiles on a filer via NFS, I should think that even with a NetApp using Gigabit Ethernet, you'd take a -huge- performance hit as compared to a local disk array.
Beware conventional wisdom Roland. People used to think the Earth was flat too. :-)
While I wouldn't be so bold as to *guarantee* performance boosts in utilizing the filer storage approach for every database application under the Sun, the simple fact is that filers contain a myriad of features that are very attractive to the database market, and NetApp now draws a significant and growing portion of its revenues from this space.
Snapshots & SnapRestore greatly simplify and enhance database backup and restore environs. The WAFL RAID design puts failure resiliancy into the disk subsystem without forcing you take the performance hits inherent in general purpose RAID-5 designs or going to disk-doubling RAID-1 approaches. SnapMirror gives you database replication to offsite locations for disaster recovery purposes. WAFL's ready expandibility lets you make room for growing databases without disrupting their operation. The list goes on...
Oh.. and yes... performance very often gets a shot in the arm too!
I've no empirical data to back this up, mind you;
Don't worry. Nobody ever does, not even our direct-attach competitors, not that they can be too harshly criticized. Meaningful performance comparisons are tricky to architect, usually have a short shelf life, and customers have an understandable tendency not to believe vendor funded benchmarks anyway (due to the fact that the vendor performing and/or funding the benchmark almost always wins!).
Nevertheless, we did publish a relatively innocuous one some time ago, which can be viewed here:
http://www.netapp.com/tech_library/3044.html
it's just that there's so much overhead associated with NFS even on an optimized platform like the NetApp filer, I can't see it as being a win.
There are certainly some "swings-and-roundabouts"-type things to consider when looking at the two approaches, and some people do conclude that there is more overhead in the network attach approach, dismissing it offhand. However, as far as performance goes, all the theory in the world is no substitute for the practical experience that could be gained by trying a solution for the application you have and actually bearing witness to how well it works and what it performs like.
If there's anyone out there with Oracle experience on filers via NFS, either pro or con, I'd love to hear from you.
I'm hoping there will be some on this list. As I mentioned, beware conventional wisdom. America might have been discovered hundreds of years before Columbus sailed over the horizon, if only all his ancestors hadn't been terrified of falling off the edge of the world!
Keith
----------- Jay Orr Systems Administrator Fujitsu Nexion Inc. St. Louis, MO
On Thu, Aug 10, 2000 at 12:01:03PM -0500, Jay Orr wrote:
Yes, latency is a big factor, but wasn't the question about directly connected RAID (or a SAN) vs. a filer? Since both are raided, the latency would be roughly equal.
But they're not.
Why?
Mostly, I suspect, because of the journaling. A good journaling filesystem on a locally-attached RAID 5 (or 4, if you get around the performance problems the way NetApp has) disk array with lots of RAM cache for the journal would probably at least match, and likely outstrip the NetApp's performance.
Keep in mind also that the NetApp can play fast-and-loose with it's journal-commits because it's all in NVRAM, so it does not commit before returning success to the client. That HAS to be a big win.