You have some NFS tuning to do in TCP, UDP is fine. This isnt your root problem, but it cant help. Check the output regarding TCP receive buffer sizes.
Client vancouver has network problems, NFS retransmits. Check output. Doesnt harm the system, but that client is impacted.
Youre fighting performance -vs- storage space. You have a number of 18 and 36G drives. Vol0 is laid out properly given this mix, you dont mix sizes within a given raid group. However disks 16, 18, and 19 are on their own as the raid group of 36G drives for this volume. They are also being beat on pretty hard.
I see that your cache hits are 100%, this would cause you to wonder if you need a larger filer, more cache RAM..but that might not be it. You have 100% disk utilization as a fact of life on the system, but you are only doing disk reads for the most part, and NOT heavy ones. Nine to 12Mb/sec max. Thats danged odd.
Heres your problem.
Vol0.
The 18G drives are about 65% busy, 95% reads. The 36G drives are 83% bush, again mostly reads.
Vol1 is quiet as a church on bowling night.
Solutions: Move data being accessed from vol0 to vol1, make use of the untilized performance you have there if this 60 seconds of history is a normal data activity. Likewise, move some of that date from vol1 to vol0, because you never want over 80% full volumes. ;)
Rebuild your volumes.
Make vol0 all of the 18G drives, 11data-1parity. Raw about 204Mb. Make vol1 all of the 36G drives, 6data-1parity. Raw about 207Mb.
If you did this, your volume sizes would be larger than they are today, because youre not wasting space on mixed sized raid groups of 18 and 36G drives (vol1 is like this), and you end up with a LARGER number of spindles to parellelize your data access across in vol0 with a single volume of more 18G drives.
Guaranteed, 100% of your performance problem is that itty bitty second raid group in volume 1.
Unfortunately you will have to destroy the village to save it, but once you take the downtime to do so, you might find years of worthwhile performance out of your 720.
Of course, you get to play God at work, and save a ton of cash while making thing better with no money spent. Yes, a larger system might reduce the missed cache hits, but 12Mb/sec is WAY less than a properly configured F720 can dish out. Id say you could triple that output easy if you did this.
Is a 12drive raid group ok?? Id say YES. ONTAP is wayyyy more resilient than it used to be. You have plenty of drives left if you like, to enable raid_dp as well, which will give a 10data-2parity the basic reliability of two separate 5-1 raid groups, cept you have a larger stripe to make your reads against which seems to be your largest type of data activity according to this perfstat.
What size Tshirt do I wear? XXL. :)
Jeff has some great suggestions here, but maybe one that you haven't thought of is to add yet another disk shelf (if the 720 supports it) with more disks. Then you can shuffle data around without having to totally dump/restore (burn down the village) scenario. You will have some downtime required here, just to do the re-balancing, but with more disks, you'll be able to spread the load better.
And these days, FC7/8 shelves with 36gb disks should be pretty cheap from a re-seller.
Take this opportunity to add some space, your users will love you.
John John Stoffel - Senior Staff Systems Administrator - System LSI Group Toshiba America Electronic Components, Inc. - http://www.toshiba.com/taec john.stoffel@taec.toshiba.com - 508-486-1087