Has anyone else experienced performance problems using UDP NFSv3 between Solaris 2.7 and an F760 (running 5.3.5R2)?
Using v3 I get the following errors intermittently:
Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 ok
When I switched the mount to v2 (udp), these errors went away, and performance seemed to increase. Netapp documentation recommends v3, but it doesnt seem to work very well in my environment.
Since this filer is used for a DB, this is causing huge problems. Should I just leave it at v2, or is worth trying to determine why v3 is performing poorly?
-Brian
foo wrote:
Has anyone else experienced performance problems using UDP NFSv3 between Solaris 2.7 and an F760 (running 5.3.5R2)?
Using v3 I get the following errors intermittently:
Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 ok
When I switched the mount to v2 (udp), these errors went away, and performance seemed to increase. Netapp documentation recommends v3, but it doesnt seem to work very well in my environment.
Since this filer is used for a DB, this is causing huge problems. Should I just leave it at v2, or is worth trying to determine why v3 is performing poorly?
We have had similar problems to those that Jeff mentioned, basically 32K blocks over UDP is, in general, not a good solution. We forced 8K blocks and the performance improved considerably. In a way the poor performance was caused by too powerful servers, they could send more data than the clients and routers could handle.
/Michael
The network at the moment is like this:
[760] | Gig-e | | [Foundry FastIron] | | 100Mb/s Ethernet | | [Foundry FastIron] | | Gig-e | [Sun e4500]
The reason for the 100Mb in between is that for the moment the 4500 and the filer are in different parts of a building (power issues prevent having them in the same place) which is connected via 100Mb.
That said, I spent a lot of time looking at the switches and network to see if it was causing the problems. The fast ethernet interface has never gone above around 30Mb or so, there are no errors of any kind of any of the ints (no collisions, no align/fcs/giant/short, nothing *at all*). Everything is full-duplex, flowcontrol is off on everything (any thoughts about that?), all of the cabling was tested before installation... in short, the network looks right.
Eventually the filer and DB will be connected directly to each other, but I have trouble believing that this will solve the current problems considering there is currently no congestion in the network.
The DB has all of the recent patches, but no optimization has been done.
What am I missing?
-Brian
foo wrote:
Has anyone else experienced performance problems using UDP NFSv3 between Solaris 2.7 and an F760 (running 5.3.5R2)?
Using v3 I get the following errors intermittently:
Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 ok
When I switched the mount to v2 (udp), these errors went away, and performance seemed to increase. Netapp documentation recommends v3, but it doesnt seem to work very well in my environment.
Since this filer is used for a DB, this is causing huge problems. Should I just leave it at v2, or is worth trying to determine why v3 is performing poorly?
-Brian
The network at the moment is like this:
[760]
flow-control need to be turned ON
| Gig-e | | [Foundry FastIron]
flow-control need to be turned ON on the port
| | 100Mb/s Ethernet | | [Foundry FastIron]
flow-control need to be turned ON
| | Gig-e |
flow-control need to be turned ON
[Sun e4500]
i guess the bottle-neck is the 100bt inbetween 10x faster Gig-e links on both-sides. ( FastIron cant forward the packets through 100Mb/s port as fast as it receives packets through 1000Mb/s port ie. from filer/e4500)
Usually its done the otherway ( 100bt -- 1000sx pipe - 100bt ) the above network topology is not 'appropriate' for a server network!
i guess there should be lots of 'dropped'/retransmits at the both ends ( 'netstat -s' )
Note : this is my first guess based on the info. in this thread.
The reason for the 100Mb in between is that for the moment the 4500 and the filer are in different parts of a building (power issues prevent having them in the same place) which is connected via 100Mb.
That said, I spent a lot of time looking at the switches and network to see if it was causing the problems. The fast ethernet interface has never gone above around 30Mb or so, there are no errors of any kind of any of the ints (no collisions, no align/fcs/giant/short, nothing *at all*). Everything is full-duplex, flowcontrol is off on everything (any thoughts about that?), all of the cabling was tested before installation... in short, the network looks right.
Eventually the filer and DB will be connected directly to each other, but I have trouble believing that this will solve the current problems considering there is currently no congestion in the network.
The DB has all of the recent patches, but no optimization has been done.
What am I missing?
-Brian
foo wrote:
Has anyone else experienced performance problems using UDP NFSv3 between Solaris 2.7 and an F760 (running 5.3.5R2)?
Using v3 I get the following errors intermittently:
Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 ok
When I switched the mount to v2 (udp), these errors went away, and performance seemed to increase. Netapp documentation recommends v3, but it doesnt seem to work very well in my environment.
Since this filer is used for a DB, this is causing huge problems. Should I just leave it at v2, or is worth trying to determine why v3 is performing poorly?
-Brian