You're right about one thing... all things are not equal. You
raised some good points along the way... see below for my
annotations.
> -----Original Message-----
> From: Brian Tao [mailto:taob@risc.org]
> Sent: Thursday, August 10, 2000 2:33 PM
> To: Pesce, Nicholas (FUSA)
> Cc: keith(a)netapp.com; toasters(a)mathworks.com
> Subject: RE: Filer storage for databases, seriously? (Was: Re: NetApp
> ques tions)
>
>
> On Wed, 9 Aug 2000, Pesce, Nicholas (FUSA) wrote:
> >
> > I'm sorry Keith. But I've heard this argument before. NFS versus
> > Direct attatch storage? I'm going to have to vote for a good direct
> > attach solution. Why?
> >
> > NFS and CIFS have huge overhead.
>
> Sure, as someone else mentioned, all other things being equal,
> direct-attach storage is faster than network-attached storage. The
> logic (which I've argued before as well) is simple: a NAS box talks
> to its drives as DAS. Thus, the DAS must necessarily be "faster"
> (yeah, a vague term). For example, setting caching aside, it is not
> possible for a filer to pump 50MB/s of data to an NFS client if it can
> only read 30MB/s off its own drives.
One thing to remember in regards to caching... when you request a block
on disk, your local OS (with direct attach storage) will read that block
and read ahead several more blocks. Those read-ahead blocks will be
transferred via your fibre channel, even if they aren't used by your
application. The filer on the other hand will do the same read-ahead,
but it will only transmit the block that was requested. If the application
asks for the next block, then both have that block in memory. Now,
in a database environment where blocks are very randomly placed within
a filesystem, this read-ahead via direct attach storage is going to kill the
efficiency of the bandwidth to your storage, and that's your OS's fault.
Whereas with a filer, you can turn off read-ahead within the filer to minimize
the work it does.
>
> However, all things are not equal, at least in benchmarks that are
> possible in the real world. Yes, NFS and CIFS add overhead compared
> to SCSI-over-FibreChannel or what have you. However, that is offset
> by an optimized OS (Data ONTAP), by an efficient filesystem (WAFL), by
> read and write caching, by an optimized TCP/IP stack, etc. If you
> could port all that and run it to DAS, then you might have a fair
> comparison.
>
> > I think I would like to see a test where the Disk sizes and number
> > were similar, I sincerely doubt the Netapp would do as well.
>
> Depends on the application, of course, but I've been surprised
> many times in the past when I thought for sure the Netapp would not be
> able to keep up. I have a 4x450-MHz E420R with a VxVM RAID-0 device,
> spread over 16 50GB 7200 rpm drives on two U2SCSI buses. The server
> also has a Gigabit Ethernet connection to an F740 with one shelf of
> 36GB 10000 rpm drives (5 data, 1 parity, 1 spare). The local
> filesystem is vxfs, mounted with delaylog and the largest allowable
> log area.
>
> I ran a few filesystem replication and backup/restore tests (this
> is our central tape server). The local filesystem handily beat the
> Netapp doing large sequential reads and writes (120MB/sec vs.
> 22MB/sec)... no surprise there. File deletions were a little closer
> (~2500 unlinks/sec on vxfs, ~2000 unlinks/sec on the Netapp). In all
> other tests, the Netapp was as fast or faster (sometimes by a large
> margin) than local filesystem. The Netapp seems to especially shine
> when you have multiple processes reading and writing to all points on
> the filesystem. vxfs does not appear to handle it as gracefully with
> dozens or hundreds of concurrent access requests.
This is an apples to oranges test.
Sure, streaming to/from RAID0 will always kick ass. However, who
really runs RAID0 these days??? (I'm sure there's about <.1% of
applications where RAID0 is suitable, because the data is not critical)
RAID 0+1 might be a slightly better comparison since the filer doesn't have
a RAID0 mode.
> I re-ran some of the same tests with a Veritas RAID-5 volume (to
> be fair to the Netapp), but I stopped after the first couple. There
> is no contest at that point. Veritas software RAID-5 is dog-slow (I
> think I saw bursts of 8MB/sec sequential writes). Turn on a Veritas
> snapshot, and writes to the snapped filesystem go even further into
> the toilet. The performance degradation is cumulative with the number
> of snapshots. There is no such penalty on the Netapp.
Ok, this is apples to apples.
> One caveat I should mention, since it bit us in the past: file
> locking performance. We have one application that, when running on
> the same type of hardware as above (E420R with those drives), spews
> forth 150,000 syscalls per second, according to Solaris' "vmstat".
> 80% of those calls are fcntl() locks/unlocks to various database files
> on disk. Poor programming practice aside, this application runs very
> slowly over NFS. It simply cannot match in-kernel file locking when
> you're dealing with a local filesystem. Besides that one exceptional
> application, we run Netapps for everything else (including Oracle).
Ok, given the environment you just described, you could enable a undocumented
feature within the solaris mount_nfs command, 'llock'. This tells the
NFS client that he shouldn't use NLM to do file locking, I'll just do
it locally. This will essentially give you in-kernel locking. The caveat
here is that you can't share the filesystem with other clients, but you
can't do that with today's direct attach storage either. I usually recommend
using the llock option in a Solaris/Oracle/filer environment.
Aaron