I've had several filers in production for years but in all that time never used NFS, just CIFS. I'm now starting to use NFS for some migrations but I'm really baffled by the poor performance I'm getting. Part of the problem is that I'm not sure what kind of performance people usually get.
When I started pulling data off my 940's I was only getting around 4-5MB/s. I'm grabbing data by way of GNU Tar on Solaris. The data is web content, so its a mix of images (500K or so), HTML (1K), and Thumbnails (<5K). No matter what I seem to do I can't boost performance above that 5MB/s mark. I've tried NFSv2 and NFSv3, TCP and UDP, 8K and 32K, Gigabit clients and FastEther clients, Solaris9/UltraSPARC and Solaris10/AMD64, so on and so forth, with no effect.
Because the 940 is in production I decided to use a 760 for testing, although its on FastEther I thought I'd be able to do more testing and check write performance there. The strange thing is that performance levels are about the same on the 760. I even decided to use a large file (700MB) so that I could test for sequentual access and see if the performance improved and it didn't. Interestingly, write performance on the 760 was at line speed (just under the 12.5MB/s limit of FastEther). So on the 760 I can write data twice as fast as I can get it.
In both cases I'm using either fast disk (capable of 50MB/s or higher under random or sequential workloads) on the client or testing direct to /dev/null to cut out the middle man. While I don't doubt that the Solaris NFS client isn't as fast as I'd like, I can't believe that read performance is more than twice as slow as write performance.
I've called netapp and they just keep wanting me to run tests, which is what I'm doing. Searching NOW hasn't helped, searching Google hasn't helped. Any ideas? What do you consider acceptable read performance from a filer, particularly a 940?
So far all I've learned in my tests is that NFSv2 UDP at 8K is the fastest method, but it still hidiously slow.
Any ideas are greatly appreciated.
benr.
Just a short response that could be longer but.. we often see writes that are a lot faster than reads because of the cache. You can write to the NetApp at line speed cause all the data is going into the cache and then sycning to disk at its own pace. However reads of non-cached data *must* seek to the data..
As for general nfs performance, could you please post you output of options nfs
cheers, Barry
Can you take a perfstat? Take a perfstat during while the file is written to the filer.
And a real quick test of read performance on the filer, is try doing a dump to /dev/null on the filer's console. A quick check of ifstat can check for network interface problems (and netdiag as well). nfs_hist is a good tool to use to check if the filer is taking a long time to respond to nfs calls. If nfs_hist shows the filer responding quickly, then the network may be at fault (I suspect the network, but that' just a hunch right now). What release are you running? 6.5.x has a whole slew of problems that can really bog down nfs performance, especially if you use DNS names. A slow dns server can cause many problems for nfs access in 6.5. Checking nfsstat -d will show you lots of detailed nfs statistics that may be of no use for you, but may at least keep you entertained while troubleshooting the problem.
Hope that helps, -Blake
On 9/12/05, Barry Robison barryr@al.com.au wrote:
Just a short response that could be longer but.. we often see writes that are a lot faster than reads because of the cache. You can write to the NetApp at line speed cause all the data is going into the cache and then sycning to disk at its own pace. However reads of non-cached data *must* seek to the data..
As for general nfs performance, could you please post you output of options nfs
cheers, Barry
Ben> I've had several filers in production for years but in all that Ben> time never used NFS, just CIFS. I'm now starting to use NFS for Ben> some migrations but I'm really baffled by the poor performance Ben> I'm getting. Part of the problem is that I'm not sure what kind Ben> of performance people usually get.
Ben> When I started pulling data off my 940's I was only getting Ben> around 4-5MB/s. I'm grabbing data by way of GNU Tar on Solaris.
I just did some tests on my F940 here, talking to a Solaris 8 box over Gigabit. I had a directory called 'foo' which is 73M in size, with 2200+ directories and 2300+ files. None of the files was that big I think.
Here's my runs with tar and time:
# time tar cf - foo > /dev/null 0.32u 1.65s 0:03.09 63.7%
Pretty fast, 3seconds implies 25mb/sec throughput. But wait... gnutar can optimize /dev/null writes and not do the work...
# time tar cf - foo > /tmp/foo.tar 0.32u 3.27s 0:12.25 29.3%
Ok, now it took 12secs, which works out to 6+mb/sec. Not great hmmm...
# time tar cf - foo > /tmp/foo.tar 0.24u 2.42s 0:03.85 69.0% # time tar cf - foo > /tmp/foo.tar 0.31u 2.11s 0:03.18 76.1% # time tar cf - foo > /tmp/foo.tar 0.26u 2.09s 0:03.27 71.8% # time tar cf - foo > /tmp/foo1.tar 0.30u 2.19s 0:03.08 80.8%
Ah, but now look what happens. It's back to 3seconds, or 25mb/sec through put over the wire. It looks like the NetApp cache has kicked in here. Which is good, since it shows that the network isn't the bottleneck here.
So I repeated my tests with another directory, bar, with 500mb total size, 790 directories, 1000 files. So they're obviously bigger files in general.
# time tar cf - bar > /tmp/foo.tar 0.64u 17.61s 0:31.51 57.9% # time tar cf - bar > /tmp/foo.tar 0.62u 14.24s 0:16.01 92.8% # time tar cf - bar > /tmp/foo.tar 0.60u 14.80s 0:17.29 89.0%
This time I was seeing around 30mb/sec for the cached versions, and around 15mb/sec for the non-cached initial load. Not too bad.
So for your setup, I suspect that your network is either heavily loaded, or that you have a bad network cable, or that you have a switch in the middle which is swamped or just isn't upto snuff.
I'm also running 7.0.1 on the NetApp. The volume which held the data was mounted via TCP onto an E450 with 4gb of RAM and 4x450mhz CPUs, which are certainly not the fastest.
Ben> The data is web content, so its a mix of images (500K or so), Ben> HTML (1K), and Thumbnails (<5K). No matter what I seem to do I Ben> can't boost performance above that 5MB/s mark. I've tried NFSv2 Ben> and NFSv3, TCP and UDP, 8K and 32K, Gigabit clients and FastEther Ben> clients, Solaris9/UltraSPARC and Solaris10/AMD64, so on and so Ben> forth, with no effect.
This all points me to suspecting that your network hardware or cabling has problems. What does 'netstat -ni' say on both ends? And can you setup a direct connection between the filer and a client to see if that can be made to run better? Get the rest of the network out of the loop.
Good luck, John John Stoffel - Senior Staff Systems Administrator - System LSI Group Toshiba America Electronic Components, Inc. - http://www.toshiba.com/taec john.stoffel@taec.toshiba.com - 508-486-1087