 
            Has anyone that uses Budtool to backup gotten any details about the option to upgrade to NetWorker? I got info from PDC about things, but other then the usual marketing hype as to how wonderful everything is the only substantial detail about upgrading and costs are :
"Companies that buy BudTool can upgrade to the NDMP Connection and a similarly configured NetWorker Server within one year after general availability, at no additional cost."
Ok, what about existing customers? I looked on their web site, about the same marketing fluff with no substance. Before many companies consider migrating, the very first question will always be "HOW MUCH??". My question too - we all try to avoid the fight of the budgets. ;p
----------- Jay Orr Systems Administrator Fujitsu Nexion Inc. St. Louis, MO
 
            We have been having some performance issues lately, and I was looking through some documentation that I got when I went to the NetApp 202 class. It says that the udp window should 32k except if you have a FDDI card or rampant packet loss. You set this with:
options nfs.udp.xfersize <value>
Anyone had any experience setting this? can we do it on a production filer and not crash? Can we set it back and still not crash if it does something horrid?
Paul Taylor Sr. Systems Engineer Connectria 215-841-5540
 
            I'm not an expert on this, but quick inspection shows that our production engineering filers are using 32kB UDP transfers.
John
On Mon, 10 Jan 2000, Paul Taylor wrote:
We have been having some performance issues lately, and I was looking through some documentation that I got when I went to the NetApp 202 class. It says that the udp window should 32k except if you have a FDDI card or rampant packet loss. You set this with:
options nfs.udp.xfersize <value>
Anyone had any experience setting this? can we do it on a production filer and not crash? Can we set it back and still not crash if it does something horrid?
Paul Taylor Sr. Systems Engineer Connectria 215-841-5540
 
            On Mon, 10 Jan 2000, Paul Taylor wrote:
options nfs.udp.xfersize <value>
Anyone had any experience setting this? can we do it on a production filer and not crash? Can we set it back and still not crash if it does something horrid?
I've never had any problems setting this value on-the-fly. On Solaris, use "nfsstat -m" to verify that the client is in fact using 32K block sizes (you will have to remount).
 
            Paul Taylor wrote:
We have been having some performance issues lately, and I was looking through some documentation that I got when I went to the NetApp 202 class. It says that the udp window should 32k except if you have a FDDI card or rampant packet loss. You set this with:
options nfs.udp.xfersize <value>
Anyone had any experience setting this? can we do it on a production filer and not crash? Can we set it back and still not crash if it does something horrid?
I had no trouble enabling it on a production filer (F760 cluster, in this case). It helped only NFS v3. In fact, our news farm runs NFS v2 on UDP because of horrible performance running any form of NFS v3 between Solaris 2.6 and the filers. I have one machine tightened to 512-byte transfers because of poor performance at bigger transfer sizes.
I'd welcome discussion on this, if anyone wants to take it to a new thread.
 
            tkaczma@gryf.net wrote:
On Tue, 11 Jan 2000, Michael S. Keller wrote:
I have one machine tightened to 512-byte transfers because of poor performance at bigger transfer sizes.
I think you really have to look into your network.
There's not much to check. It goes in one switch port and out another on the same switch. The interfaces show no errors. I do have the filer trunked (EtherChannel) and my news clients have hand-tuned MAC addresses to reduce contention, since the switch does "dumb" switching based on MAC addresses instead of loads.
 
            Can you specify more on your MAC address trick ?
Eyal.
"Michael S. Keller" wrote:
tkaczma@gryf.net wrote:
On Tue, 11 Jan 2000, Michael S. Keller wrote:
I have one machine tightened to 512-byte transfers because of poor performance at bigger transfer sizes.
I think you really have to look into your network.
There's not much to check. It goes in one switch port and out another on the same switch. The interfaces show no errors. I do have the filer trunked (EtherChannel) and my news clients have hand-tuned MAC addresses to reduce contention, since the switch does "dumb" switching based on MAC addresses instead of loads.
 
            On Wed, 12 Jan 2000, Eyal Traitel wrote:
Can you specify more on your MAC address trick ?
I think what he means is that he has all the addresses nicely distributed over the number of trunking interfaces he has on the filer. I assume that by "dumb" he means MAC hashing, a method of distributing load among ethernet interfaces based on the ethernet address. Usually x last bits are used where 2^x ~= number of interfaces in the trunk. I haven't found (not that I looked very hard) documentation on what happens in this scenario if one of the trunk links dies. If someone can point me to that excerp of the standard/documentation I'd appreciate it.
Please read on as I've missed Michael's response quoted below.
"Michael S. Keller" wrote:
There's not much to check. It goes in one switch port and out another on the same switch. The interfaces show no errors.
The interfaces on both sides show no errors? How about collisions? If one side shows collisions and the other doesn't then you have yourself a duplex mismatch. I'd check for the same on the client side.
I haven't found anyone yet that has convinced me that duplex negotiation works. I've seen it NOT work too many times and when it doesn't the performance turns worse and worse as congestion increases. I'd hard set the switch and the boxes (including the netapp) to full-duplex if in fact that is what your boxes support.
Make sure it isn't your clients that are having problems with large packets as well as the switch. There are certain options in ATM switches, if I'm recalling correctly, that will effectively reduce your maximum ethernet frame size. That could also put a damper on large packets as they use several maximum sized frames.
I'm not saying that it isn't a problems with NACs, but my experience tells me to look into the networking before jumping to any conclusions. My networking group swore that our performance problems were due to NACs. In fact they were so entrenched in their ideas that they didn't believe me that NACs purposely "ignored" ICMP packets under some conditions. It turned out that we had a case of duplex mismatch between the NACs and the switches. The NAC was happily chuging at full-duplex which it autonegotiated and the switch decided that half-duplex was good enough for some of the interfaces. Also, check the ethernet card on the NAC, I already replaced two of them - well, we have about 20 NACs, but nevertheless I didn't expect it. See what kind of performance you're getting with just one interface enabled at a time and then rotate through the interfaces.
Tom
 
            tkaczma@gryf.net wrote:
On Wed, 12 Jan 2000, Eyal Traitel wrote:
Can you specify more on your MAC address trick ?
I think what he means is that he has all the addresses nicely distributed over the number of trunking interfaces he has on the filer. I assume that by "dumb" he means MAC hashing, a method of distributing load among ethernet interfaces based on the ethernet address. Usually x last bits are used where 2^x ~= number of interfaces in the trunk. I haven't found (not that I looked very hard) documentation on what happens in this scenario if one of the trunk links dies. If someone can point me to that excerp of the standard/documentation I'd appreciate it.
Correct. I have two interfaces per trunk. I have two trunks per filer quad card. I have one quad card per filer. The Cisco 5505 (Ethernet in this case, not ATM) supports a maximum of four interfaces per trunk. If an interface in a two-interface trunk dies, all traffic re-routes to the remaining interface. The algorithm in the 5505 XORs the last two bits of the sending and receiving MAC addresses to determine the port number. With a large number of clients, this tends to even out. With only four clients, I had to make a truth table of desired results, then change client MAC addresses to achieve the result.
Please read on as I've missed Michael's response quoted below.
"Michael S. Keller" wrote:
There's not much to check. It goes in one switch port and out another on the same switch. The interfaces show no errors.
I haven't found anyone yet that has convinced me that duplex negotiation works. I've seen it NOT work too many times and when it doesn't the performance turns worse and worse as congestion increases. I'd hard set the switch and the boxes (including the netapp) to full-duplex if in fact that is what your boxes support.
Duplex is forced full at the switch and at the clients. I fixed that months ago. netstat shows no collisions at the clients or at the filers.
Make sure it isn't your clients that are having problems with large packets as well as the switch. There are certain options in ATM switches, if I'm recalling correctly, that will effectively reduce your maximum ethernet frame size. That could also put a damper on large packets as they use several maximum sized frames.
I'm not saying that it isn't a problems with NACs, but my experience tells me to look into the networking before jumping to any conclusions. My networking group swore that our performance problems were due to NACs. In fact they were so entrenched in their ideas that they didn't believe me that NACs purposely "ignored" ICMP packets under some conditions. It turned out that we had a case of duplex mismatch between the NACs and the switches. The NAC was happily chuging at full-duplex which it autonegotiated and the switch decided that half-duplex was good enough for some of the interfaces. Also, check the ethernet card on the NAC, I already replaced two of them - well, we have about 20 NACs, but nevertheless I didn't expect it. See what kind of performance you're getting with just one interface enabled at a time and then rotate through the interfaces.
I may try that, but not immediately.
 
            On Wed, 12 Jan 2000, Michael S. Keller wrote:
I may try that, but not immediately.
According to what you said everything looks right. You've done a good job, and I am as dumbfounded as you are. BTW, do you use locking via NFS. Solaris has a pretty nasty NLM bug which surfaces when used against NetApps. Look into patch 106639 for this and other NFS goodies.
Tom
 
            tkaczma@gryf.net wrote:
On Wed, 12 Jan 2000, Michael S. Keller wrote:
I may try that, but not immediately.
According to what you said everything looks right. You've done a good job, and I am as dumbfounded as you are. BTW, do you use locking via NFS. Solaris has a pretty nasty NLM bug which surfaces when used against NetApps. Look into patch 106639 for this and other NFS goodies.
Tom
I installed 106639-03. I also installed 106641-01 and 106882-01, mentioned in the notes for 106639-03. The host already had the other patches mentioned in 106639-03's notes. I rebooted the host after patch installation.
The output of `iostat -cxn 10|awk '$8 > 100.0 {print}'` follows. I use it to assess average service time of NFS mounts. v2/UDP/512-byte transfers perform generally better than settings with bigger transfer sizes or v3. 10.16.11[89].10 is the heavily-loaded filer. 10.16.11[89].15 is the lightly-loaded filer. This host uses it without contention from other hosts, yet it has more slow transfer times.
v2/UDP/512-byte transfers:
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.6 0.0 0.8 0.0 0.0 0.3 0.0 163.3 0 26 10.16.118.15:/vol/vol0/News/4/4 1.6 0.0 0.8 0.0 0.0 0.3 0.0 163.6 0 26 10.16.119.15:/vol/vol0/News/4/1 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 3.7 0.0 1.8 0.0 0.0 0.7 0.0 188.8 0 70 10.16.118.15:/vol/vol0/News/4/6 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.3 0.0 0.2 0.0 0.0 0.2 0.0 673.3 0 20 10.16.119.15:/vol/vol0/News/4/3 0.3 0.0 0.2 0.0 0.0 0.4 0.0 1352.2 0 41 10.16.118.15:/vol/vol0/News/4/6 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.9 0.0 0.5 0.0 0.0 0.2 0.0 202.6 0 18 10.16.118.15:/vol/vol0/News/4/4 1.3 0.0 0.7 0.0 0.0 0.8 0.0 626.3 0 81 10.16.118.15:/vol/vol0/News/4/6 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.7 0.0 0.8 0.0 0.0 0.5 0.0 313.7 0 53 10.16.118.15:/vol/vol0/News/4/4 3.2 0.0 1.6 0.0 0.0 0.5 0.0 163.5 0 52 10.16.119.15:/vol/vol0/News/4/5 3.9 0.0 1.9 0.0 0.0 0.5 0.0 116.3 0 45 10.16.119.15:/vol/vol0/News/4/7 4.8 0.0 2.4 0.0 0.0 0.5 0.0 108.9 0 52 10.16.118.15:/vol/vol0/News/4/8
I prepared more, but my mail client died, losing my composition window. Results for v2/UDP/default sizes and v2/UDP/default sizes are quite similar to the results below for v3/TCP/default sizes.
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.1 0.0 3.2 0.0 0.0 0.1 0.0 1454.4 0 15 10.16.119.15:/vol/vol0/News/4/1 0.1 0.0 3.2 0.0 0.0 0.2 0.0 1543.0 0 15 10.16.119.15:/vol/vol0/News/4/5 0.1 0.0 3.2 0.0 0.0 0.1 0.0 1455.6 0 15 10.16.119.15:/vol/vol0/News/4/7 0.2 0.0 6.4 0.0 0.0 0.3 0.0 1346.3 0 27 10.16.119.15:/vol/vol0/News/4/9 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.2 0.0 6.4 0.0 0.0 0.2 0.0 937.6 0 17 10.16.118.15:/vol/vol0/News/4/4 0.1 0.0 3.2 0.0 0.0 0.1 0.1 999.1 0 5 10.16.118.15:/vol/vol0/News/4/8 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.6 0.0 4.5 0.2 0.2 322.7 396.8 19 20 md4 0.0 0.6 0.0 4.5 0.0 0.2 0.0 396.7 0 20 md6 0.1 0.1 3.2 0.8 0.0 0.1 1.6 546.1 0 11 10.16.119.15:/vol/vol0/News/4/9 0.4 0.1 12.8 1.6 0.0 0.2 0.6 317.1 0 8 10.16.118.15:/vol/vol0/News/4/8 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.1 0.0 3.2 0.0 0.0 0.2 0.0 1548.6 0 15 10.16.119.15:/vol/vol0/News/4/3 0.3 0.0 9.6 0.0 0.0 0.1 0.0 447.0 0 13 10.16.119.15:/vol/vol0/News/4/5 0.1 0.0 3.2 0.0 0.0 0.3 0.0 3035.1 0 30 10.16.118.15:/vol/vol0/News/4/6 0.1 0.0 3.2 0.0 0.0 0.3 0.0 2703.4 0 27 10.16.118.15:/vol/vol0/News/4/8 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.1 0.0 3.2 0.0 0.0 0.3 0.0 3498.9 0 35 10.16.118.15:/vol/vol0/News/4/2 0.2 0.1 6.4 0.8 0.0 0.3 0.1 1051.8 0 32 10.16.118.15:/vol/vol0/News/4/8
 
            I appear to have a problem with the lightly-loaded filer's quad 10/100 card or the cables connecting it to the switch. I should have more clue on the cables by Monday.
I moved traffic to the lightly-loaded filer to the on-board 10/100 port (F760). iostat looked better. After installing the Solaris 2.6 patches mentioned yesterday that address poor NFS client performance with a NetApp filer, I changed NFS mount options to "vers=3,proto=tcp".
See the output of iostat -cxn 10|awk '$8 > 100.0 {print}':
# iostat -cxn 10|awk '$8 > 100.0 {print}' r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.1 3.9 0.8 57.3 0.1 0.9 26.8 213.7 11 49 md4 0.1 3.9 0.8 57.3 0.0 0.7 0.0 169.9 0 48 md6 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.7 0.0 5.6 0.1 0.3 123.0 468.0 9 21 md4 0.0 0.7 0.0 5.6 0.0 0.3 0.0 468.0 0 21 md6 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 5.6 0.0 84.8 0.1 0.7 16.3 131.4 9 24 md4 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.1 0.6 0.8 4.5 0.0 0.1 48.5 154.3 3 6 md4 0.0 0.6 0.0 4.5 0.0 0.1 0.0 171.5 0 6 md6 0.0 0.4 0.0 6.4 0.0 0.1 0.7 165.0 0 7 10.16.119.10:/vol/vol0/logs r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 5.6 0.0 75.2 0.0 0.6 6.2 115.7 3 18 md4 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device ^C#
Much better.
If I request an RMA on an older quad 10/100, will I receive a newer Intel-based board?
 
            This host continued fine throughout the weekend. I applied the same patches and NFS mount option changes to one of the news feeders (running bCandid's Cyclone product). It got stuck in here:
Jan 14 18:06:13 news-feeder2 unix: NFS server 10.16.118.10 not responding still trying
on Friday. I rebooted it in the last hour. Both Solaris boxes used the same NFS server. I may have to drop TCP. I hope I can keep v3 and its larger transfer sizes.
"Michael S. Keller" wrote:
I appear to have a problem with the lightly-loaded filer's quad 10/100 card or the cables connecting it to the switch. I should have more clue on the cables by Monday.
I moved traffic to the lightly-loaded filer to the on-board 10/100 port (F760). iostat looked better. After installing the Solaris 2.6 patches mentioned yesterday that address poor NFS client performance with a NetApp filer, I changed NFS mount options to "vers=3,proto=tcp".
See the output of iostat -cxn 10|awk '$8 > 100.0 {print}':
# iostat -cxn 10|awk '$8 > 100.0 {print}' r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.1 3.9 0.8 57.3 0.1 0.9 26.8 213.7 11 49 md4 0.1 3.9 0.8 57.3 0.0 0.7 0.0 169.9 0 48 md6 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.7 0.0 5.6 0.1 0.3 123.0 468.0 9 21 md4 0.0 0.7 0.0 5.6 0.0 0.3 0.0 468.0 0 21 md6 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 5.6 0.0 84.8 0.1 0.7 16.3 131.4 9 24 md4 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.1 0.6 0.8 4.5 0.0 0.1 48.5 154.3 3 6 md4 0.0 0.6 0.0 4.5 0.0 0.1 0.0 171.5 0 6 md6 0.0 0.4 0.0 6.4 0.0 0.1 0.7 165.0 0 7 10.16.119.10:/vol/vol0/logs r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 5.6 0.0 75.2 0.0 0.6 6.2 115.7 3 18 md4 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device ^C#
Much better.
If I request an RMA on an older quad 10/100, will I receive a newer Intel-based board?
 
            I had to move the news feeder back to NFS v2/UDP/8K. The reader continues to run fine with v3/TCP/32K.
"Michael S. Keller" wrote:
This host continued fine throughout the weekend. I applied the same patches and NFS mount option changes to one of the news feeders (running bCandid's Cyclone product). It got stuck in here:
Jan 14 18:06:13 news-feeder2 unix: NFS server 10.16.118.10 not responding still trying
on Friday. I rebooted it in the last hour. Both Solaris boxes used the same NFS server. I may have to drop TCP. I hope I can keep v3 and its larger transfer sizes.






