We found that symlinks were causing 60% of the CPU load on our F740.
-------------------------- Sent from my BlackBerry Wireless Handheld (www.BlackBerry.net)
-----Original Message----- From: Jean-Christophe Smith jsmith@vitalstream.com To: toasters@mathworks.com toasters@mathworks.com CC: jsmith@publichost.com jsmith@publichost.com Sent: Thu Oct 25 13:51:06 2001 Subject: F720 CPU Maxing out
We have a NetApp F720 doing both NFS and CIFS using mixed mode security. It has 256 megs of ram. During parts of the day the CPU and RAM spike to %100 utilization. We do a lot of sendmail, windows media, apache and IIS traffic over the filer.
I have two questions: 1. Is there a way we can add more ram, perhaps 512 or a gig? The docs say we are maxxed out with 256. 2. Is it possible that the extra cpu overhead that provides the NFS/CIFS user mapping is causing much of the ram and cpu starvation? Perhaps going to a simpler security model would offload a lot of CPU cycles.
Any suggestions would be appreciated..
Jean-Christophe Smith VitalStream jsmith@vitalstream.com
What did you do to figure out that symlinks were using so much of the CPU load?
I've got a 720 that's becoming very unresponsive under high loads. It doesn't get up to 100%, but it does hit 85-90%. This is during times of high writes to one volume, on the order of 5-10 Megs/sec.
The weirdest part is when it gets really slow, I loose quite a few large pings to the filer from the effected machine. This makes me worry about my network as well. I've had many duplex issues cause the same kinds of problems before, but I've checked duplex on all of my ports.
Is it normal for high writes to bog down a filer this much? Could it be symlinks causing my woes? Any tips for solving this problem?
I've turned on per host nfs stats logging, any tips of decoding this information? I assume that the machine with the highest number of writes would be the offending host that's slamming my filer?
Thanks for any info you may be able to provide.
- Eli
Michael Rogan wrote:
We found that symlinks were causing 60% of the CPU load on our F740.
Sent from my BlackBerry Wireless Handheld (www.BlackBerry.net)
-----Original Message----- From: Jean-Christophe Smith jsmith@vitalstream.com To: toasters@mathworks.com toasters@mathworks.com CC: jsmith@publichost.com jsmith@publichost.com Sent: Thu Oct 25 13:51:06 2001 Subject: F720 CPU Maxing out
We have a NetApp F720 doing both NFS and CIFS using mixed mode security. It has 256 megs of ram. During parts of the day the CPU and RAM spike to %100 utilization. We do a lot of sendmail, windows media, apache and IIS traffic over the filer.
I have two questions:
- Is there a way we can add more ram, perhaps 512 or a gig? The docs say we
are maxxed out with 256. 2. Is it possible that the extra cpu overhead that provides the NFS/CIFS user mapping is causing much of the ram and cpu starvation? Perhaps going to a simpler security model would offload a lot of CPU cycles.
Any suggestions would be appreciated..
Jean-Christophe Smith VitalStream jsmith@vitalstream.com
I've got a 720 that's becoming very unresponsive under high loads. It doesn't get up to 100%, but it does hit 85-90%. This is during times of high writes to one volume, on the order of 5-10 Megs/sec.
The unresponsiveness to pings seems odd, but those writes are probably about as much as you'll get out of the 720. Looking at my 760's in the past, they had exhausted cpu at just over 20MB/s of writes (those were full-frame packets over a non-jumbo'ed Gig-II). Having been at a cash-strapped dot com for some time now, I haven't had the luxury of perusing new hardware options much, but does anyone know what kind of performance benefit is picked up w/ ZCS vs. BCS volumes?
Could it be symlinks causing my woes? Any tips for solving this problem?
No matter how many, or how inefficient the symlinks are, they won't affect write performance (unless of course you've got writes competing wiht lookups). Once the file's open, it doesn't matter how the filesystem got there.
..kg..
kevin graham wrote:
I've got a 720 that's becoming very unresponsive under high loads. It doesn't get up to 100%, but it does hit 85-90%. This is during times of high writes to one volume, on the order of 5-10 Megs/sec.
The unresponsiveness to pings seems odd, but those writes are probably about as much as you'll get out of the 720. Looking at my 760's in the past, they had exhausted cpu at just over 20MB/s of writes (those were full-frame packets over a non-jumbo'ed Gig-II). Having been at a cash-strapped dot com for some time now, I haven't had the luxury of perusing new hardware options much, but does anyone know what kind of performance benefit is picked up w/ ZCS vs. BCS volumes?
Is this a memory bottleneck? I think I only have 256 mb in my filers, should I upgrade for better performance?
Could it be symlinks causing my woes? Any tips for solving this problem?
No matter how many, or how inefficient the symlinks are, they won't affect write performance (unless of course you've got writes competing wiht lookups). Once the file's open, it doesn't matter how the filesystem got there.
The sluggishness is noticed on lookups and read. Mainly because all of our developers have paths on that filer sourced in their profiles in their unix shells (/usr/local/ and home directories for instance) their shells get REALLY REALLY slow all of the sudden. I think I may need to rethink my storage architecture if heavy reads/writes are going to the filers performance this much.
Basically, I guess I should balance it between my 2 filers that the one with the highest traffic is not where the home directories and /usr/local sit.
This is on my older f720, and I just got a f740. Is it much of an upgrade in CPU? Can I just swap heads and have a faster filer?
- Eli
..kg..
Is this a memory bottleneck? I think I only have 256 mb in my filers, should I upgrade for better performance?
Its CPU for the writes. I presume OnTap has a lookup cache ala Solaris' DNLC, so more memory should be a boon to the slow lookup times. However, I've never seen sanctioned or non-sanctioned aftermarket upgrades for cpu/ram in any filer (except cpu swaps in the old x86 boxes, IIRC).
The sluggishness is noticed on lookups and read. Mainly because all of our developers have paths on that filer sourced in their profiles in their unix shells (/usr/local/ and home directories for instance) their shells get REALLY REALLY slow all of the sudden.
One approach would be to increase directory caching on the client side. Not very elegant or appliance-like, but it might help. You can also give a shot at using CacheFS's, particurally for that /usr/local mount. Automating cachefs setup w/ automount isn't the prettiest, but doing this for web content breathed life back into my production nfs servers.
This is on my older f720, and I just got a f740. Is it much of an upgrade in CPU?
Its been forever since I've looked at specs, but I believe each step in the 700's included a marginal cpu clock speed bump as well, double the ram (256mb 720, 512mb 740, 1024mb 760), and more PCI (each step down from the 760 progressively carved up the system board w/ a different position for the edge connector).
Can I just swap heads and have a faster filer?
This should work fine (esp given the 700-800 head swap someone just outlined on the list), but I'd wait for a definitive answer from someone bestowed with more clue than myself before trying in production.
..kg..
On Fri, Oct 26, 2001 at 01:27:59PM -0400, kevin graham wrote:
This is on my older f720, and I just got a f740. Is it much of an upgrade
in CPU?
Its been forever since I've looked at specs, but I believe each step in the 700's included a marginal cpu clock speed bump as well, double the ram (256mb 720, 512mb 740, 1024mb 760), and more PCI (each step down from the 760 progressively carved up the system board w/ a different position for the edge connector).
If I recall correctly, the CPU & memory differences are:
F720 F740 F760 ---- ---- ---- CPU 21164A (MHz) 400 400 600 RAM (MB) 256 512 1024 NVRAM (MB) 8 32 32
There's also a difference in # of PCI slots, etc. From examination (when I got my first F7x0 :), the motherboards look effectively the same except for the number of PCI slots soldered on the system, and the `fit out' in terms of CPU speed, RAM and NVRAM installed. I suspect that the F8x0s are the same; this makes sense from a manufacturing point of view.
The thing that's probably killing your F720 the most is the crippled amount of NVRAM. There's various ways you can observe this. One simple way is to look at the output of 'sysstat 1' on the console when the filer is under load. If the disks are constantly being written to, (without gaps of a few seconds), that can indicate that your NVRAM is being filled and flushed all the time. There's other commands you can use to analyse this that are covered in the advanced NetApp training courses, and your local SE should be able to help you here too.
Unfortunately, unlike earlier systems (e.g, the F330) which had the ability to have an NVRAM upgrade (e.g, from 2MB to 8MB), NetApp doesn't support upgrading the NVRAM (or RAM) in the F720. I don't know if OnTAP would recognise and support an F740 NVRAM card in an F720 (it probably does :), but NetApp wouldn't support that. What you can do is replace the F720 head with an F740 head (a 5 minute swapover without any software modes) and you'd notice an immediate difference.
Hope that helps, Luke.