You have some NFS tuning to do in TCP, UDP is fine. This isnt your root
problem, but it cant help. Check the output regarding TCP receive
buffer sizes.
Client vancouver has network problems, NFS retransmits. Check output.
Doesnt harm the system, but that client is impacted.
Youre fighting performance -vs- storage space. You have a number of 18
and 36G drives. Vol0 is laid out properly given this mix, you dont mix
sizes within a given raid group. However disks 16, 18, and 19 are on
their own as the raid group of 36G drives for this volume. They are
also being beat on pretty hard.
I see that your cache hits are 100%, this would cause you to wonder if
you need a larger filer, more cache RAM..but that might not be it. You
have 100% disk utilization as a fact of life on the system, but you are
only doing disk reads for the most part, and NOT heavy ones. Nine to
12Mb/sec max. Thats danged odd.
Heres your problem.
Vol0.
The 18G drives are about 65% busy, 95% reads.
The 36G drives are 83% bush, again mostly reads.
Vol1 is quiet as a church on bowling night.
Solutions:
Move data being accessed from vol0 to vol1, make use of the untilized
performance you have there if this 60 seconds of history is a normal
data activity. Likewise, move some of that date from vol1 to vol0,
because you never want over 80% full volumes. ;)
Rebuild your volumes.
Make vol0 all of the 18G drives, 11data-1parity. Raw about 204Mb.
Make vol1 all of the 36G drives, 6data-1parity. Raw about 207Mb.
If you did this, your volume sizes would be larger than they are today,
because youre not wasting space on mixed sized raid groups of 18 and 36G
drives (vol1 is like this), and you end up with a LARGER number of
spindles to parellelize your data access across in vol0 with a single
volume of more 18G drives.
Guaranteed, 100% of your performance problem is that itty bitty second
raid group in volume 1.
Unfortunately you will have to destroy the village to save it, but once
you take the downtime to do so, you might find years of worthwhile
performance out of your 720.
Of course, you get to play God at work, and save a ton of cash while
making thing better with no money spent. Yes, a larger system might
reduce the missed cache hits, but 12Mb/sec is WAY less than a properly
configured F720 can dish out. Id say you could triple that output easy
if you did this.
Is a 12drive raid group ok?? Id say YES. ONTAP is wayyyy more
resilient than it used to be. You have plenty of drives left if you
like, to enable raid_dp as well, which will give a 10data-2parity the
basic reliability of two separate 5-1 raid groups, cept you have a
larger stripe to make your reads against which seems to be your largest
type of data activity according to this perfstat.
What size Tshirt do I wear? XXL. :)