Yael Hellmann wrote:
You are still limitted by the network and netapp will not be able to compete with products which provide direct disk attachment. Products in mind are SANergy and CXFS.
Graham C. Knight wrote:
We have networked configurations of filers and workstations that outperforms the onboard SCSI speeds of the workstations. It depends on what your local storage bus is, what your network is, what your application is, what your average filesize is, etc, etc, etc.
Many people are surprised that a network attached solution can outperform a direct attached solution. Let me talk about how this can happen. (Note that I'm not claiming that NAS always outperforms! Just that it sometimes will, and that we shouldn't be surprised.)
There are (at least) three reasons for this:
(1) Network Bandwidth has largely caught up with direct disk connect bandwidth.
In the "old days", you had a 10bT Ethernet connection running against SCSI. SCSI was 10 MB/sec, and 10bT was only about 1 MB/sec. But it was really worse than that, because the Ethernet was shared by many clients, so the effective bandwidth was even smaller. I estiamte the performance penalty for Ethernet as being roughly 100 to 1.
Today, you've got fibre channel which is 100 MB/sec competing against Gigabit Ethernet, which is also about 100 MB/sec. We could argue about the TCP/IP overhead, but I think the trend is clear.
And I'll bet that Cisco ships 10 Gb Ethernet before anyone ships 10 Gb fibre channel.
(2) In many applications, performance is actually limited by the number of spindles, NOT by the wires between disk and CPU.
If you think about it, disk seek time is about the only thing in computer science that doesn't double every year or two. Disk capacity keeps doubling, but performance does not. So increasingly, the bottleneck is moving to the disk itself, and the speed of the wires between disk and CPU is becoming less important.
This explains why NAS often outperformed direct attach even back in the days of FDDI, when the disk wires were still much faster than the network wires.
(3) The "file attached" approach of NAS allows less data to fly over the wires as compared with the "raw disk attached" approach of SCSI over fibre channel.
Consider a simple operation, like creating a new file. With a "file attached" protocol like NFS or CIFS, the client sends over a request to create a new file that includes the directory to put it in and the new name. It's going to be a total of maybe 400 bytes that need to go over the wire.
On the other hand, with a "raw disk attached" approach, you are going to have to read many blocks of data in order to find where on disk to put the new file, to scan the directory to make sure the file name is not already there. And then you are going to have to write many blocks of data, to update the directory block that contains the new entry, to update "modify date" of the directory inode itself, and to mark the new inode used in the inode allocation tables. If you had to grow the directory it's even worse, because then you have to read the the free block entries to find out where to put the new data, and write the free block entries to indicate that the newly allocated block is no longer free.
In the end, you could have many, many k-bytes of data going over the wire with a "raw disk attached" approach, as compared to the few hundred bytes for the "file attached" approach.
Now, I admit that the example I chose the worst case example to talk about here! But even if you chose a much simpler example like appending to the end of a file, the "file attached" approach allows you to simply send the data along with a tiny bit of information about what file and what offset to put it at. With "raw disk attached", you have to load block allocation tables to find a free spot on disk, update inode blocks to indicate the new "modify date", update indirect blocks to indicate where the new data is being stored. Even with the simple example of appending to a large file, you could end up with three or four times the I/O flying over the wire.
Of course, all of this same I/O does eventually need to hit disk even with NAS. The point is, the I/O ends up going over the private wires coming out of the back of the NAS device, rather than having to go over the shared TCP/IP network (or shared FC network), so you are not loading down the shared resource.
Well -- I think that's enough gory detail for now!
Again -- just to be clear, I am NOT claiming that NAS will always outperform direct attached!
My point is just many people are surprised that NAS could *ever* outperform direct attached, but when you look at industry trends around network performance, and when you consider what is actually going on as you communicate with the disk, you realize that it isn't really so surprising, after ll, how well NAS can perform.
(Hey Yael! Are you the same Yael Hellmann who works at EMC?)
Dave