Well, I wouldn't go so far as to call it "the" clarification of Dave's position. It's certainly *my* position and my take on your question "why focus on the protocol?" I'll let Dave decide whether or not he wants to let it stand or chime in with something different.
Sorry if I didn't make that obvious in the original message.
Ray
-----Original Message----- From: Adams, Christian [mailto:Adams_Christian@emc.com] Sent: Monday, April 05, 1999 2:32 PM To: 'Chen, Ray'; Adams, Christian; Hitz, Dave; willh@infi.net Cc: toasters@mathworks.com; barsellt@infi.net Subject: RE: NAC SAN
Hey Ray Chen -
Thanks for the clarification of Dave Hitz's position.
/Christian Adams EMC
-----Original Message----- From: Chen, Ray [SMTP:ray.chen@netapp.com] Sent: Sunday, April 04, 1999 10:19 AM To: 'Adams, Christian'; Hitz, Dave; willh@infi.net Cc: toasters@mathworks.com; barsellt@infi.net Subject: RE: NAC SAN
Um, because the distinctive characteristic of SAN is that clients can write directly to the storage devices bypassing any server? This is possible only if a storage-level (e.g. block-level) protocol like SCSI runs on the wire. (Actually, I'm lying here. There are two ways of enabling this. One is to use a really simple protocol like SCSI. The other is to implement an industry-standard clustered filesystem with the disk drives/storage devices as an intelligent component of said filesystem. A very complex protocol whose viability and practicality is an issue in its own right. The rest of this mail assumes a block-level SAN protocol.)
As folks have pointed out, SAN has its advantages and disadvantages. It eliminates the backplane of server (unless you have a
RAID controller
in front of the drives :-) from the I/O path, eliminating a possible I/O bottleneck. But there is no server to protect the integrity of the data and even if there is, the protocol is so low-level in a SAN, the server has no effective way of detecting when a client has gone insane (maliciously or not) and is scribbling gibberish on the drives. So trust becomes a definite issue.
Personally, I see data sharing using SANS as making sense in small workgroup environments and what I call "machine room clusters" -- a SAN in a machine room supporting a dedicated cluster. Typically you'd do the latter because scaling via a cluster is cheaper (and more available) than scaling via MP. But both are situations where you are willing to trust all the machines in the cluster because they are all firmly controlled by the group whose storage/data is being shared.
Ray Chen rcc@netapp.com
-----Original Message----- From: Adams, Christian [mailto:Adams_Christian@emc.com] Sent: Tuesday, March 30, 1999 10:14 AM To: 'Dave Hitz'; willh@infi.net Cc: toasters@mathworks.com; barsellt@infi.net Subject: RE: NAC SAN
Hey Dave Hitz -
Why do you focus on existing protocols in your discussion? This indicates that the introduction of Fibrechanel with either
encapsulated SCSI or
Ethernet is the critical event in the creation of SAN or NAS. Further in your discussion, you draw a parallel to switches and routers which indicate that you may be thinking in terms of shared versus segmented data flows. And yet in all of your discussion, you do not address the
issues of
management, integration, or value.
Loved your EMC bit, /Christian Adams EMC
-----Original Message----- From: Dave Hitz [SMTP:hitz@netapp.com] Sent: Saturday, March 27, 1999 1:41 PM To: willh@infi.net Cc: toasters@mathworks.com; barsellt@infi.net Subject: Re: NAC SAN
Tom's statement below (to a [NetApp] SAN when we build
one) brings up a
familiar argument between myself and my boss. My boss
contends that
NetApp is NOT capable of migrating to a "real SAN."
I've been delinquent in my duty of writing a NAS and
SAN paper. I'm
about half way done. Whem I'm done, I'll post it here,
probably in
early form, so that you guys can critique it for me.
Very briefly, I believe that NAS and SAN are different
but related
technologies, and that the definitions are as follows:
NAS (Network Attached Storage) Storage accessed over TCP/IP, using
industry standard file
sharing protocols like NFS, HTTP and
Windows Networking.
SAN (Storage Area Network) Storage accessed over a Fibre Channel
switching fabric,
using encapsulated SCSI.
I also think that both NAS and SAN are useful, and that --
in the long
run -- both will be critical components in any large data infrastructure. I think there's a strong analogy to the TCP/IP networking world. Routers and switches are very similar in
some ways,
yet subtly different in others. In the mid-eighties,
arguments raged
about which was "better". (Back then switches were
called bridges.)
Today we understand that any large network
infrastructure needs to
include both.
Some people would quibble with my definitions. Some would
argue that
if you run NFS over TCP/IP over fibre channel, then that is
a SAN. I
figure that we didn't invent new names for NFS over 100bT,
FDDI, ATM,
or Gigabit -- what's so special about fibre channel?
Others argue that
a SAN is any network dedicated entirely to storage traffic.
Does this
mean a SAN turns to a NAS if you fire up a telnet on
it? A year ago
these definitions were less settled, but today I think
these are the
definitions that are emerging.
With these definitions, the key difference is between
file system
protocols and raw disk protocols. With NAS, the filesystem
runs on the
storage system itself. (For NetApp, that's WAFL.)
With SAN, the
filesystem runs on the hosts attached to the storage system.
People familiar with EMC will recognize this model. With an EMC Symetrix, you get a big box of RAID protected disks, and
you can attach
multiple hosts to that one box. Thus, I view SAN as
"open EMC". On
the one hand, SAN could be viewed as a threat to EMC. But
on the other
hand, EMC has a solid history of products -- software and
hardware --
based on this storage model, which means that they ought to
have a real
head-start on SAN. I think the SAN vendors have a great
opportunity if
they really can create "open EMC". After all, EMC is doing
$4 Billion
a year in business with this model!
In any case, I think NAS has some serious advantages over
SAN. One big
issue with running the file system on the separate
hosts is that it
makes true data sharing very difficult. The on-disk byte
formats are
very different between UNIX and NT, and even between
different flavors
of UNIX. Given the differences between UNIT and NT filesystem semantics, it's no surprise that the on-disk format is
different. With
NAS, the filesystem is by the data, so it can convert
the on-disk
format into an industry standard protocol appropriate to
the host that
wants the data: NFS for UNIX, NT Shares for NT, and HTTP for web browsers.
It is technically possible to build a "global file system"
that has a
common on-disk format for multiple operating systems,
thus allowing
NAS-like sharing for SAN. However, I'm skeptical of that
model. First
of all, there are no such products today. And even
when they become
available, sharing won't be based on industry standards. A
system will
only be able to participate in sharing if it runs a
special SAN file
system from a particular vendor. My believe is that the
lesson of 20
years of networking is that we build heterogeneous
environments by
creating open standards and allowing everyone to support
them -- not by
saying that anyone who wants to participate in sharing
must buy code
from one particular company.
Having said all this, though, NetApp does have plans to
leverage SAN
technology. We have announced an OEM agreement with
Brocade. Here are
the benefits we expect to achieve:
- Attach to lots of disks.
A fibre channel loop is limited to 127 drives,
but with a 16
port fibre channel switch, the limit goes to almost 2000 drives. With additional switches, you can
support an arbitrary
number of drives.
- Share tape drives between multiple servers.
I believe that backup applications are the
killer application
for SAN. When people buy expensive tape
jukeboxes, they want
to be able to attach them to multiple servers.
With today's
SCSI tapes that's a painful job. But with
fibre channel and
fibre channel switches, it's easy to connect a
single tape to
any number of hosts.
- Disaster Tolerant Solutions
Fibre channel switches can do WAN tunneling to provide connections to remote disks without any changes
to the server
code. This makes it much easier to build
remote mirrors for
data replication and disaster protection.
In conclusion, I believe that NAS and SAN are both important technologies, and the any large data infrastructure in the
future will
include both. I believe that NAS focuses on issues
associated with
users, applications, and data sharing. I believe that SAN
focuses on
issues associated with disk and tape devices,
connection to large
numbers of them, and moving data back and forth between
them. In the
long run, any large environment will have both sets of
issues, and will
need both types of solutions. In the short run, NAS makes
good sense
as a starting point, because it is here today and the
standards it
relies on have been solid for over a decade.
I'll go ahead an post the whole paper when I finish, but I still thought it might be useful to share my thoughts in this brief, half-baked form.
Dave
On Mon, 5 Apr 1999, Chen, Ray wrote:
Sorry if I didn't make that obvious in the original message.
Guys, ever hear of the delete button? Please use it to trim your quotes. Most of us probably read the original message. For those that didn't there is the archive. I suggest you quote at the beginning of your replies. That will motivate you to trim them down to a minimum, encourage you to interlace you responses with the original text, so even someone who didn't read the original text has a point of reference, and if you forget to trim it down, it will cause the readers ignore your message giving you an additional incentive to trim it down next time.
Tom