How to balance volume priority (some NFS vs. CIFS) - toasters

9 Feb 2009


      I have not opened a case with Netapp yet but probably will if no one has 
any good ideas; I just like to pick people's brains before going 
official.  Thanks for any input.
A few months ago we moved a file share off a Windows server onto our 
FAS3040 Netapp running 7.2.4 and shared it out via CIFS.  It contains 
software install files and scripts and depending on scheduled jobs, it 
can get hit pretty hard and pushes out approximately 1 Gbit/sec, which 
has been drastically affecting the service times for our other shares on 
that filer, and its namely response-sensitive NFS shares we care about 
that are affected the most such as mail and web files.  It doesn't 
really seem to be a disk bottleneck because the disk read/sec in sysstat 
is usually only half of what the filer is pushing out to the network, so 
I assume its reading some data from cache.  The CIFS software install 
share can either get hit by 1-60+ CIFS clients where each client reads 
files on and off for hours at a time, or sometimes we have hundreds of 
clients hitting the share at once for a smaller set of files (such as to 
update one software package across a large set of PCs).  I've been able 
to reproduce the slowdown with just 4 CIFS clients on gigabit 
downloading a large file from the share.  Sometimes it only causes a 
modest slowdown in the NFS response time but sometimes email messages 
being moved between folders will stall for 8 seconds or much more, which 
is pretty much unacceptable.  I don't think its a bottleneck in my core 
network because I've done tests where the slow nfs client is on the same 
switch as the filer, which is connected via two gig links using LACP.  
Also, in the normal situation where the slowdown is encountered, mail 
(NFS) traffic is flowing through a different gig uplink than the hungry 
CIFS clients.
Goal: reduce the impact of greedy clients (primarily known ones, but 
hopefully unexpected ones too) on the response time of the rest of the 
filer's clients.  I don't care if the CIFS software share must accept 
slower data rates, and I'd rather not run away from the problem by 
avoiding it but rather learn what I can do to prevent my filer from 
being held hostage by greedy clients.  I do have another 3040 I could 
move the share to, but that filer also has volumes that would be 
affected negatively in the same way, and I'd rather not concede defeat 
and go back to hosting the share on a dedicated windows server.  I can 
try different code versions in a test environment if I need to, but I'd 
like to think this kind of situation would have come up already and have 
a solution at hand.
I've played around with na_priority trying to set the mail and website 
volumes to high or veryhigh priority and the software share to low or 
verylow but that isn't making a measurable impact.  I'm not really sure 
what to tweak or check next.
Here is an example from sysstat when I am simulating the slowdown 
condition with 4 CIFS clients on gigabit fetching the same file.
CPU    NFS   CIFS   HTTP      Net kB/s     Disk kB/s      Tape kB/s    
Cache
                               in   out     read  write    read 
write     age
  6%   2058    167      0     751  1543     2196      0       0     
0      11
  6%   2590    164      0     699  2238     2904     32       0     
0      11
 10%   2183    223      0    1241  4471     5072  17872       0     
0      11
 11%   3299    799      0    1577 22194     4935   1183       0     
0      11
 22%   3298   3072      0    3005 107869     9128     24       0     
0      11
 18%   2532   1986      0    2270 87651     2078      0       0     
0      11
 18%   2198   2200      0    1696 105941     8032      8       0     
0      11
 16%   3597   1650      0    1890 84691     3528     24       0     
0      11
 23%   4946   2216      0    2604 112741    14664      0       0     
0      11
 22%   4075   2041      0    2324 100380    21568      0       0     
0      11
 CPU    NFS   CIFS   HTTP      Net kB/s     Disk kB/s      Tape kB/s    
Cache
                               in   out     read  write    read 
write     age
 21%   3272   2246      0    2862 115380     4688     24       0     
0      11
 21%   4117   2092      0    2686 109165     3864      8       0     
0      11
 26%   4188   2136      0    3436 115081    21900      0       0     
0      11
......(skip)
 30%   7487   1773      0    4261 93385    10156   3328       0     
0       6
 25%   4566   1900      0    3339 96655    13764   9808       0     
0       7
 24%   2965   2202      0    2477 111493    11772   5475       0     
0       8
 23%   5256   1986      0    3093 102409    10508     24       0     
0       8
 19%   2979   2068      0    1810 102282     9926      0       0     
0       8
 20%   3164   2323      0    2301 111209     1560      8       0     
0       8
 23%   7082   2165      0    2322 103816     2292     24       0     
0       8
 22%  11780   1158      0    2763 55501     1760      0       0     
0       8
 20%  12032    675      0    3820 36504     2452      0       0     
0       8
 CPU    NFS   CIFS   HTTP      Net kB/s     Disk kB/s      Tape kB/s    
Cache
                               in   out     read  write    read 
write     age
 23%  16269   1122      0    3914 54034     4460     24       0     
0       6
 18%   8991   1030      0    2739 48400     4568      8       0     
0       6
 10%   3903    237      0    1346  4494     3828      0       0     
0       6
 11%   3912    219      0    1623  4301     3808   6508       0     
0       6
  8%   2402    224      0     868  2027     2744   8712       0     
0       6