Hi. I'm sending this to am-utils and toasters since I'm not entirely sure where the problem is. These systems (redhat73-1, redhat73-2, etc.) are in a cluster. These errors all occur around the same time on all the clients (freqently different jobs) but sometimes spread out across 10-30 minutes.
My first guess is simply that the netapps are to busy to answer the clients.
*If* that's the case, then what can I change on the netapp or the client-end to mitigate this problem? What other options are there?
From the /var/log/amd on clients:
redhat73-1: May 25 13:48:01 redhat73-1 amd[965]/error: get_nfs_version: failed to contact portmapper on host "netapp1": RPC: Timed out redhat73-2: May 25 13:48:07 redhat73-2 amd[965]/error: get_nfs_version: failed to contact portmapper on host "netapp2": RPC: Timed out
clients are dual Xeons running redhat 7.3 am-utils-6.0.7-4 2.4.20-28.7smp mount options=(rw,hard,intr,grpid,retrans=30,timeo=30,retry=10,dev=00000010,vers=3,proto=tcp) # Interesting note: We actually set retry=10000 but the rh7.3 systems use 10 instead
netapps are 960/980 class systems ontap=6.5.2R1P13 They are typically very busy (100%) when the errors occur
Network switches show no errors NIC cards on client and netapps show nothing Nothing in the messages file in clients nor on the netapps.
Thanks in advance and I'll summarize if I get some good input.
- Jay
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com