hrm, i am also seeing odd cpu issues with 7.0.0.1 on some new deployments. with only a few systems we are constantly over 75% cpu and alot of times well into the 90's steadily on cpu usage. i havnt seen anything related on NOW yet.
-----Original Message----- From: owner-toasters@mathworks.com on behalf of Langborg Tom Sent: Thu 5/26/2005 6:10 AM To: 'Jay@Williamsons.org'; toasters@mathworks.com; am-utils@am-utils.org Subject: SV: amd / netapp issue?
There were some problem with redhat7.3 and nfs. That bug on redhat break down my tru64 fileserver. For that bug you need to update the kernel. We run now 2.4.20-28 on the old rehat7.3. I have also a problem with ontapp 7.0 with 100% cpu. We think it was nfs logical problem who crashed my netapp. So we upgraded now to 7.0.1R1 and hope it works.
/tom
-----Ursprungligt meddelande----- Från: Jay Williamson [mailto:jay_williamson@yahoo.com] Skickat: den 26 maj 2005 14:20 Till: toasters@mathworks.com; am-utils@am-utils.org Ämne: amd / netapp issue?
Hi. I'm sending this to am-utils and toasters since I'm not entirely sure where the problem is. These systems (redhat73-1, redhat73-2, etc.) are in a cluster. These errors all occur around the same time on all the clients (freqently different jobs) but sometimes spread out across 10-30 minutes.
My first guess is simply that the netapps are to busy to answer the clients.
*If* that's the case, then what can I change on the netapp or the client-end to mitigate this problem? What other options are there?
From the /var/log/amd on clients:
redhat73-1: May 25 13:48:01 redhat73-1 amd[965]/error: get_nfs_version: failed to contact portmapper on host "netapp1": RPC: Timed out redhat73-2: May 25 13:48:07 redhat73-2 amd[965]/error: get_nfs_version: failed to contact portmapper on host "netapp2": RPC: Timed out
clients are dual Xeons running redhat 7.3 am-utils-6.0.7-4 2.4.20-28.7smp mount options=(rw,hard,intr,grpid,retrans=30,timeo=30,retry=10,dev=00000010,vers=3,proto=tcp) # Interesting note: We actually set retry=10000 but the rh7.3 systems use 10 instead
netapps are 960/980 class systems ontap=6.5.2R1P13 They are typically very busy (100%) when the errors occur
Network switches show no errors NIC cards on client and netapps show nothing Nothing in the messages file in clients nor on the netapps.
Thanks in advance and I'll summarize if I get some good input.
- Jay
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Daniel> hrm, i am also seeing odd cpu issues with 7.0.0.1 on some new Daniel> deployments. with only a few systems we are constantly over Daniel> 75% cpu and alot of times well into the 90's steadily on cpu Daniel> usage. i havnt seen anything related on NOW yet.
They've released 7.0.1R1 and we've been encouraged to upgrade. We were running 7.0.0.1P6 before that, there's a bunch of bugs fixed in the 7.0.1R1 over the 7.0.0.1 series which are quite serious.
We're mostly a NFS shop, with the standard sun automounter, but we haven't seen any other issues yet.
Crossed-fingers, since we're going live tonight with a new server. Whee!
John