RE: Snapmirror throughput question

List overview All Threads
Download

newer

older

Hill, Aaron

7 Mar 2003 7 Mar '03

12:31 a.m.

Hey, thanks for the info.

Here is the network path;

Source Filer GigE ----- GigE on 4006 Cisco switch 100/FastE ---- 100/FastE 3660 Router 100/FastE ==== 20M VLAN ==== 100/FastE 3600 Router 100/FastE ---- 100/FastE 2924 Cisco Switch 100/FastE ---- 100/FastE Target Filer

The VLAN carrier, UECOMM tells us no traffic shaping on the VLAN given to us.

Our network team has no traffic shaping on the 3660 nor the 3600.

See anything else in there causing an issue?

Aaron

-----Original Message----- From: Michael van Elst [mailto:mlelstv@serpens.de] Sent: Friday, March 07, 2003 10:44 AM To: Hill, Aaron Subject: Re: Snapmirror throughput question

On Thu, Mar 06, 2003 at 04:40:11PM -0600, Hill, Aaron wrote:

...

Hi people, What are the maximum data throughputs that Snapmirroring people are

seeing

...

across a WAN? Is anyone seeing 15-20+ Mbit bandwidth usage from a

snapmirror

...

data stream?

I had a setup with three F840 clusters mirroring to a single F840 cluster using a 100Mbps tunnel. During the initial full copy this saturated the tunnel, the subsequent incremental updates were on a much lower level (4-8Mbps), because there weren't that many changes to the data. When I paused the mirror for a while and then resumed it, the then much bigger update again saturated the tunnel.

The filers all had GigE and for the tunnel a dedicated GigE each, so that snapmirror traffic didn't compete with regular accesses. The filers were connected to Catalyst6509s that had a 100TX link to the tunnel routers.

I am a bit confused what the "dedicated 20Mbit Ethernet link" of yours is exactly. If that's some kind of allocated bandwidth or subject to traffic shaping, it could easily explain the bad utilization.

Greetings,

-- Michael van Elst Internet: mlelstv@serpens.de "A potential Snark may lurk in every tree."

Show replies by date

Michael van Elst

7 Mar 7 Mar

1:48 a.m.

New subject: Snapmirror throughput question

On Thu, Mar 06, 2003 at 06:31:18PM -0600, Hill, Aaron wrote:

...

Hey, thanks for the info.

Here is the network path;

Source Filer GigE ----- GigE on 4006 Cisco switch 100/FastE ---- 100/FastE 3660 Router 100/FastE ==== 20M VLAN ==== 100/FastE 3600 Router 100/FastE ---- 100/FastE 2924 Cisco Switch 100/FastE ---- 100/FastE Target Filer

The VLAN carrier, UECOMM tells us no traffic shaping on the VLAN given to us.

Our network team has no traffic shaping on the 3660 nor the 3600.

See anything else in there causing an issue?

I would start looking at flow-control settings for the GigE on the 4006 and also verify the MTU sizes on each interface to rule out packet fragmentation.

But I still suspect the 20Mbps bandwidth limit to have a significant effect. To make this more visible you could enable traffic shaping on your routers with a slightly smaller limit.

-- Michael van Elst Internet: mlelstv@serpens.de "A potential Snark may lurk in every tree."

Stephane Bentebba

2:38 p.m.

New subject: Snapmirror throughput question / pchar mesurement tool

Hill, Aaron wrote:

...

Hey, thanks for the info.

Here is the network path;

Source Filer GigE ----- GigE on 4006 Cisco switch 100/FastE ---- 100/FastE 3660 Router 100/FastE ==== 20M VLAN ==== 100/FastE 3600 Router 100/FastE ---- 100/FastE 2924 Cisco Switch 100/FastE ---- 100/FastE Target Filer

The VLAN carrier, UECOMM tells us no traffic shaping on the VLAN given to us.

Our network team has no traffic shaping on the 3660 nor the 3600.

See anything else in there causing an issue?

Aaron

-----Original Message----- From: Michael van Elst [mailto:mlelstv@serpens.de] Sent: Friday, March 07, 2003 10:44 AM To: Hill, Aaron Subject: Re: Snapmirror throughput question

On Thu, Mar 06, 2003 at 04:40:11PM -0600, Hill, Aaron wrote:

...
Hi people, What are the maximum data throughputs that Snapmirroring people are

seeing

...
across a WAN? Is anyone seeing 15-20+ Mbit bandwidth usage from a

snapmirror

...
data stream?

I had a setup with three F840 clusters mirroring to a single F840 cluster using a 100Mbps tunnel. During the initial full copy this saturated the tunnel, the subsequent incremental updates were on a much lower level (4-8Mbps), because there weren't that many changes to the data. When I paused the mirror for a while and then resumed it, the then much bigger update again saturated the tunnel.

The filers all had GigE and for the tunnel a dedicated GigE each, so that snapmirror traffic didn't compete with regular accesses. The filers were connected to Catalyst6509s that had a 100TX link to the tunnel routers.

I am a bit confused what the "dedicated 20Mbit Ethernet link" of yours is exactly. If that's some kind of allocated bandwidth or subject to traffic shaping, it could easily explain the bad utilization.

Greetings,

if you want to test your network path deep inside, i would sugest you * pchar * u could find it on the net (google) for free : http://www.employees.org/~bmah/Software/pchar/ http://www.employees.org/~bmah/Talks/pchar-NGI-99-Slides.pdf this tool is handy but it takes a while to complete (as you pass throught a wan, start it in the morning and check it in the afternoon; also modify default options like -R (Repetitions per hop))

for info : a sample output (no more than 15 hops can be treated) << # pchar -v filer85 pchar to filer85 (192.168.10.234) using UDP/IPv4 Using raw socket input Packet size increments from 32 to 1500 by 32 46 test(s) per repetition 32 repetition(s) per hop 0: 172.31.240.253 (bsun18) Partial loss: 0 / 1472 (0%) Partial char: rtt = 1.388522 ms, (b = 0.000133 ms/B), r2 = 0.866755 stddev rtt = 0.006350, stddev b = 0.000008 Partial queueing: avg = 0.000134 ms (1010 bytes) Hop char: rtt = 1.388522 ms, bw = 60166.975881 Kbps Hop queueing: avg = 0.000134 ms (1010 bytes) 1: 172.31.250.253 (172.31.250.253) Partial loss: 0 / 1472 (0%) Partial char: rtt = 0.217381 ms, (b = 0.000067 ms/B), r2 = 0.902658 stddev rtt = 0.002676, stddev b = 0.000003 Partial queueing: avg = 0.000060 ms (1010 bytes) Hop char: rtt = --.--- ms, bw = --.--- Kbps Hop queueing: avg = -0.000074 ms (0 bytes) 2: 192.168.10.234 (filer85) Path length: 2 hops Path char: rtt = 0.217381 ms r2 = 0.902658 Path bottleneck: 60166.975881 Kbps Path pipe: 1634 bytes Path queueing: average = 0.000060 ms (1010 bytes) Start time: Fri Nov 22 12:04:33 2002 End time: Fri Nov 22 12:17:20 2002 #

...

...

u would check Path bottleneck at the summary output as you can see Start time and End time : pchar needs 13 minutes to compute for 2 hops

and the usage : << genghis:~# pchar Usage: pchar [-a analysis] [-b burst] [-c] [-d debuglevel] [-g gap] [-G gaptype] [-h] [-H hops] [-I increment] [-m mtu] [-n] [-p protocol] [-P port] [-q] [-R reps] [-s hop] [-S] [-t timeout] [-T tos] [-v] [-V] [-w file] -r file | host] -a analysis Set analysis type (default is lsq) lsq Least sum of squares linear fit kendall Linear fit using Kendall's test statistic lms Least median of squares linear fit lmsint Least median of squares linear fit (integer computations) -b Burst size (default = 1) -c Ignore route changes -d debuglevel Set debugging output level -g gap Inter-test gap in seconds (default = 0.25) -G gaptype Inter-test gap type (default is fixed) fixed Fixed gap exp Exponentially distributed random -H hops Maximum number of hops (default = 30) -h Print this help information -I increment Packet size increment (default = 32) -l host Set origin address of probes (defaults to hostname) -m mtu Maximum packet size to check (default = 1500) -M mode Operational mode (defaults to pchar) pchar Path characterization trout Tiny traceroute -n Don't resolve addresses to hostnames -p protocol Network protocol (default is ipv4udp) ipv4udp UDP over IPv4 ipv4raw UDP over IPv4 (raw sockets) ipv4icmp ICMP over IPv4 (raw sockets) ipv6icmp ICMPv6 over IPv6 (raw sockets) ipv6tcp TCP over IPv6 ipv6udp UDP over IPv6 -P port Starting port number (default = 32768) -q Quiet output -r file Read data from a file (- for stdin) -R reps Repetitions per hop (default = 32) -s hop Starting hop number (default = 1) -S Do SNMP queries per-hop -t timeout ICMP timeout in seconds (default = 3) -T tos Set IP type-of-service field (default = 0) -v Verbose output -V Print version information -w file Write data to a file (- for stdout)

...

...

and last : don't worry about some rtts or bws being displayed as * --.--- * : <<

If memory serves me right, Support FPS wrote:

Hi Stephane--

Sorry for the delay...I was out of town and unable to read my email for a few days.

...

...
Second : bellow is a pchar output, i would like to know : a - is it normal that we don't have some information (rtt displayed as --,---) in the 2nd hop b - is it normal that in the summary, Path char rtt is 0.222667 ( the last rtt ) but not 1.330503 + 0.222667 ( the sum of rtt ) ? another way to say it : isn't strange to see overall rtt being recorded as the smallest rtt gotten ?

Comments in-line:

...

...
here is the command and output : << # ./pchar -v filer85 pchar to filer85 (192.168.10.234) using UDP/IPv4 Using raw socket input Packet size increments from 32 to 1500 by 32 46 test(s) per repetition 32 repetition(s) per hop 0: 172.31.240.253 (bsun18) Partial loss: 0 / 1472 (0%) Partial char: rtt = 1.330503 ms, (b = 0.000154 ms/B), r2 = 0.909658 stddev rtt = 0.005918, stddev b = 0.000007 Partial queueing: avg = 0.000119 ms (773 bytes) Hop char: rtt = 1.330503 ms, bw = 51893.838057 Kbps Hop queueing: avg = 0.000119 ms (773 bytes) 1: 172.31.250.253 (172.31.250.253) Partial loss: 0 / 1472 (0%) Partial char: rtt = 0.222667 ms, (b = 0.000067 ms/B), r2 = 0.904762 stddev rtt = 0.002636, stddev b = 0.000003 Partial queueing: avg = 0.000102 ms (773 bytes) Hop char: rtt = --.--- ms, bw = --.--- Kbps Hop queueing: avg = -0.000017 ms (0 bytes) 2: 192.168.10.234 (filer85)

A funny thing happened here in that the RTT along two hops was computed to be larger than the RTT for just the first hop by itself. (In other words, 0.222667 > 1.330503.) A common cause for this is that the router at hop #1 (172.31.250.253) takes longer to generate an ICMP time-exceeded message than it does to forward a packet.

In the normal case, we compute the RTT for the second hop by taking the RTT for the first two hops and subtracting the RTT for the first hop alone. If we tried that here, we'd get a negative number (0.222667 - 1.330503 = -1.107836). Clearly this makes no sense. At one point, pchar used to print the negative RTT, but I got too many emails asking what a negative RTT meant. So I changed it to just print the "--.---" string instead.

...

Path length:       2 hops
Path char:         rtt = 0.222667 ms r2 = 0.904762
Path bottleneck:   51893.838057 Kbps
Path pipe:         1444 bytes
Path queueing:     average = 0.000102 ms (773 bytes)
Start time:        Tue Nov 19 16:53:51 2002
End time:          Tue Nov 19 17:06:36 2002

...

...

The RTT for the entire path is equal to the partial RTT for two hops, because this is a two-hop path. The fact that this is smaller than the RTT for the first hop doesn't really matter (again, remember that many routers can forward packets faster than they can generate ICMP messages).

In fact, the path RTT is always going to be equal to the partial path RTT from the source to the last hop (except for some pathological cases where we cannot compute this value).

Hope this helps!

Bruce.

...

...

8309

Age (days ago)

8309

Last active (days ago)

toasters@lists.teaparty.net

2 comments

3 participants

tags (0)

participants (3)

Hill, Aaron
Michael van Elst
Stephane Bentebba