Hi,
We've had great luck with BSDI and our F720's, so we decided to spec it for a Sun customer.
There are 2 machines.. Thor and Sif. Thor is an Ultra 1, 170E running 5.5.1, pretty patched up. Sif is an Ultra 1, 170E running 5.7, pretty patched up.
Both systems currently have 2 Multi-pack 12's with 10 disks in each. The systems serve a single web site, DNS load balanced which is usually decent (LB sales people need not apply, customer doesn't want it).
We loaded the content onto the F720, and unmounted ALL the disks of Thor, and told it to use the Filer.
Immediately the load rose on the box, and infact has gotten as bad as :
last pid: 1040; load averages: 48.67, 35.91, 22.98 18:04:24 315 processes: 251 sleeping, 63 running, 1 on cpu CPU states: 0.0% idle, 28.8% user, 71.2% kernel, 0.0% iowait, 0.0% swap Memory: 497M real, 142M free, 175M swap, 1235M free swap
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 777 www 34 0 2544K 1896K run 0:01 0.22% 1.65% httpd 1030 karupspc 24 0 3232K 2880K run 0:00 0.22% 1.65% ttsgvalidate.c 556 www 34 0 2568K 1936K sleep 0:02 0.20% 1.53% httpd 494 www 34 0 2544K 1896K run 0:01 0.20% 1.53% httpd 618 www 34 0 2544K 1896K sleep 0:02 0.18% 1.40% httpd 800 www 34 0 2544K 1896K sleep 0:01 0.18% 1.40% httpd 762 www 34 0 2552K 1904K sleep 0:00 0.18% 1.40% httpd 926 www 34 0 2544K 1904K run 0:00 0.17% 1.27% httpd
While at the same time :
last pid: 8293; load averages: 1.02, 1.51, 1.59 18:04:57 312 processes: 310 sleeping, 1 zombie, 1 on cpu CPU states: 55.5% idle, 11.4% user, 22.7% kernel, 10.4% iowait, 0.0% swap Memory: 512M real, 7888K free, 139M swap in use, 1268M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND 8292 root 1 50 0 2384K 1648K cpu 0:00 1.51% top 7903 www 1 58 0 2712K 1856K sleep 0:01 0.79% httpd 8176 www 1 58 0 2696K 1840K sleep 0:00 0.77% httpd 8005 www 1 58 0 2688K 1824K sleep 0:00 0.44% httpd 7838 www 1 58 0 2688K 1816K sleep 0:01 0.43% httpd 8002 www 1 58 0 2712K 1856K sleep 0:00 0.42% httpd 8071 www 1 48 0 2688K 1824K sleep 0:00 0.34% httpd 8066 www 1 49 0 2688K 1824K sleep 0:00 0.34% httpd 8092 www 1 58 0 2712K 1856K sleep 0:00 0.33% httpd
Sif was doing that.....
Is this expected? ALOT more kernel usage, a LOT higher load average. It ebbs and flow, but it hasn't gotten below 5 since we started. 5 didn't bother me. Almost 50 does.
On BSDI, I know I have a daemon called NFSIOD :
Nfsiod runs on an NFS client machine to service asynchronous I/O requests to its server. It improves performance but is not required for correct operation.
I did notice quite an improvement. Is there anything I need to do on the Sun or Filer to get better performance? Any ideas why I might have degraded the performance so bad?
Thanks, Tuc/TTSG
Thor has a lot more processes in iowait, probably disk wait, which is why the load is higher since those processes are counted as running. Assuming they are otherwise configured identically, you should try upgrading Thor to 5.7. Alternatively, check to see that the duplex setting on Thor is correct.
Bruce
TTSG> There are 2 machines.. Thor and Sif. Thor is an Ultra 1, 170E TTSG> running 5.5.1, pretty patched up. Sif is an Ultra 1, 170E TTSG> running 5.7, pretty patched up.
Try upgrading the 5.5.1 system to 2.6 + patches. 2.6 did alot of work on the IO subsystem. We've seen wonderfull improvements with our various backup servers when we made these changes. Not directly related, but close.
John John Stoffel - Senior Unix Systems Administrator - Lucent Technologies stoffel@lucent.com - http://www.lucent.com - 978-952-7548 john.stoffel@ascend.com - http://www.ascend.com
TTSG> There are 2 machines.. Thor and Sif. Thor is an Ultra 1, 170E TTSG> running 5.5.1, pretty patched up. Sif is an Ultra 1, 170E TTSG> running 5.7, pretty patched up.
Try upgrading the 5.5.1 system to 2.6 + patches. 2.6 did alot of work on the IO subsystem. We've seen wonderfull improvements with our various backup servers when we made these changes. Not directly related, but close.
Not sure if I made this clear on the first email.
Thor was put on, and Sif was not. The 5.7 system HAS NOT been on the filer yet. So your saying if I do the 5.7 and everything is fine, then upgrade the OS? I guess that should be the next step.
Tuc/TTSG
On Wed, 20 Sep 2000, TTSG wrote:
We loaded the content onto the F720, and unmounted ALL the disks of Thor, and told it to use the Filer.
What does `sysstat 2` show on the F720? If there is a low-load, the problem is with Thor. If the load is high, the filer might be resource bound. If in the middle, <shrug>. A single F720, though, should be able to handle anything an Ultra1 could throw at it, without breaking a sweat.
What network pieces are between Thor and the F720?
NFS v2 or v3? If v3, over TCP or UDP?
Until next time...
The Mathworks, Inc. 508-647-7000 x7792 3 Apple Hill Drive, Natick, MA 01760-2098 508-647-7001 FAX tmerrill@mathworks.com http://www.mathworks.com ---
On Wed, 20 Sep 2000, TTSG wrote:
We loaded the content onto the F720, and unmounted ALL the disks of Thor, and told it to use the Filer.
What does `sysstat 2` show on the F720?
Maybe a CPU % of 8-12, NFSops of 400-600. (I haven't gotten authorization to put the machine back on in a bit, since it really does a number on the web site they run)
If there is a low-load, the problem is with Thor.
I tried it on Sif. Sif went nuts with file locking. CGI's that need to do file lock seem to take FOREVER (7-10 seconds). If I change the CGI to point to a local filestore, the load goes from maybe 2.X to 6-7.X . The CPU goes 100% with an increased user and kernel.
If the load is high, the filer might be resource bound. If in the middle, <shrug>. A single F720, though, should be able to handle anything an Ultra1 could throw at it, without breaking a sweat.
Thats what I figured. I need to throw 2 of them, serving 7-8Mb/s content out the front end.
What network pieces are between Thor and the F720?
Thor -> 4 ft network cable -> Cisco 2924 switch -> 8 ft network cable -> Cisco 5505 -> 6 inch network cable -> RJ45/Punchdown block -> 200ft network cable -> RJ45/Punchdown block -> 6 inch network cable -> RJ45/Punchdown block -> 24 ft network cable -> RJ45/Punchdown block -> 8 ft network -> Cisco 2924 -> 8 ft network cable -> F720
Sif has a 6foot cable, then the rest is the same.
NFS v2 or v3? If v3, over TCP or UDP?
ACK, how do I know/tell? I just do a "mount -o soft IP:/vol/vol1 /mnt".
Until next time...
Thanks...
Tuc/TTSG