Hill, Aaron wrote:
Hey, thanks for the info.

Here is the network path;

Source Filer GigE ----- GigE on 4006 Cisco switch 100/FastE ---- 100/FastE
3660 Router 100/FastE ==== 20M VLAN ==== 100/FastE 3600 Router 100/FastE
---- 100/FastE 2924 Cisco Switch 100/FastE ---- 100/FastE Target Filer

The VLAN carrier, UECOMM tells us no traffic shaping on the VLAN given to
us.

Our network team has no traffic shaping on the 3660 nor the 3600.

See anything else in there causing an issue?

Aaron


   

-----Original Message-----
From: Michael van Elst [mailto:mlelstv@serpens.de] 
Sent: Friday, March 07, 2003 10:44 AM
To: Hill, Aaron
Subject: Re: Snapmirror throughput question

On Thu, Mar 06, 2003 at 04:40:11PM -0600, Hill, Aaron wrote:
  
Hi people,
 What are the maximum data throughputs that Snapmirroring people are
    
seeing
  
across a WAN? Is anyone seeing 15-20+ Mbit bandwidth usage from a
    
snapmirror
  
data stream?
    

I had a setup with three F840 clusters mirroring to a single F840
cluster using a 100Mbps tunnel. During the initial full copy
this saturated the tunnel, the subsequent incremental updates
were on a much lower level (4-8Mbps), because there weren't
that many changes to the data. When I paused the mirror for
a while and then resumed it, the then much bigger update again
saturated the tunnel.

The filers all had GigE and for the tunnel a dedicated GigE each,
so that snapmirror traffic didn't compete with regular accesses.
The filers were connected to Catalyst6509s that had a 100TX link
to the tunnel routers.

I am a bit confused what the "dedicated 20Mbit Ethernet link" of
yours is exactly. If that's some kind of allocated bandwidth
or subject to traffic shaping, it could easily explain the
bad utilization.


Greetings,
  
if you want to test your network path deep inside, i would sugest you * pchar *
u could find it on the net (google) for free :
    http://www.employees.org/~bmah/Software/pchar/
    http://www.employees.org/~bmah/Talks/pchar-NGI-99-Slides.pdf
this tool is handy but it takes a while to complete (as you pass throught a wan, start it in the morning and check it in the afternoon; also modify default options like -R (Repetitions per hop))

for info :
a sample output (no more than 15 hops can be treated)
<<
# pchar -v filer85
pchar to filer85 (192.168.10.234) using UDP/IPv4
Using raw socket input
Packet size increments from 32 to 1500 by 32
46 test(s) per repetition
32 repetition(s) per hop
 0: 172.31.240.253 (bsun18)
    Partial loss:      0 / 1472 (0%)
    Partial char:      rtt = 1.388522 ms, (b = 0.000133 ms/B), r2 = 0.866755
                       stddev rtt = 0.006350, stddev b = 0.000008
    Partial queueing:  avg = 0.000134 ms (1010 bytes)
    Hop char:          rtt = 1.388522 ms, bw = 60166.975881 Kbps
    Hop queueing:      avg = 0.000134 ms (1010 bytes)
 1: 172.31.250.253 (172.31.250.253)
    Partial loss:      0 / 1472 (0%)
    Partial char:      rtt = 0.217381 ms, (b = 0.000067 ms/B), r2 = 0.902658
                       stddev rtt = 0.002676, stddev b = 0.000003
    Partial queueing:  avg = 0.000060 ms (1010 bytes)
    Hop char:          rtt = --.--- ms, bw = --.--- Kbps
    Hop queueing:      avg = -0.000074 ms (0 bytes)
 2: 192.168.10.234 (filer85)
    Path length:       2 hops
    Path char:         rtt = 0.217381 ms r2 = 0.902658
    Path bottleneck:   60166.975881 Kbps
    Path pipe:         1634 bytes
    Path queueing:     average = 0.000060 ms (1010 bytes)
    Start time:        Fri Nov 22 12:04:33 2002
    End time:          Fri Nov 22 12:17:20 2002
#
>>
u would check Path bottleneck at the summary output
as you can see Start time and End time : pchar needs 13 minutes to compute for 2 hops

and the usage :
<<
genghis:~# pchar
Usage: pchar [-a analysis] [-b burst] [-c] [-d debuglevel] [-g gap] [-G gaptype] [-h] [-H hops] [-I increment] [-m mtu] [-n] [-p protocol] [-P port] [-q] [-R reps] [-s hop] [-S] [-t timeout] [-T tos] [-v] [-V] [-w file] -r file | host]
    -a analysis    Set analysis type (default is lsq)
            lsq    Least sum of squares linear fit
            kendall    Linear fit using Kendall's test statistic
            lms    Least median of squares linear fit
            lmsint    Least median of squares linear fit (integer computations)
    -b        Burst size (default = 1)
    -c        Ignore route changes
    -d debuglevel    Set debugging output level
    -g gap        Inter-test gap in seconds (default = 0.25)
    -G gaptype    Inter-test gap type (default is fixed)
            fixed    Fixed gap
            exp    Exponentially distributed random
    -H hops        Maximum number of hops (default = 30)
    -h        Print this help information
    -I increment    Packet size increment (default = 32)
    -l host        Set origin address of probes (defaults to hostname)
    -m mtu        Maximum packet size to check (default = 1500)
    -M mode        Operational mode (defaults to pchar)
            pchar    Path characterization
            trout    Tiny traceroute
    -n        Don't resolve addresses to hostnames
    -p protocol    Network protocol (default is ipv4udp)
            ipv4udp        UDP over IPv4
            ipv4raw        UDP over IPv4 (raw sockets)
            ipv4icmp    ICMP over IPv4 (raw sockets)
            ipv6icmp    ICMPv6 over IPv6 (raw sockets)
            ipv6tcp        TCP over IPv6
            ipv6udp        UDP over IPv6
    -P port        Starting port number (default = 32768)
    -q        Quiet output
    -r file        Read data from a file (- for stdin)
    -R reps        Repetitions per hop (default = 32)
    -s hop        Starting hop number (default = 1)
    -S        Do SNMP queries per-hop
    -t timeout    ICMP timeout in seconds (default = 3)
    -T tos        Set IP type-of-service field (default = 0)
    -v        Verbose output
    -V        Print version information
    -w file        Write data to a file (- for stdout)

>>

and last :
don't worry about some rtts or bws being displayed as * --.--- * :
<<
If memory serves me right, Support FPS wrote:

Hi Stephane--

Sorry for the delay...I was out of town and unable to read my email for 
a few days.

> Second : bellow is a pchar output, i would like to know :
> a - is it normal that we don't have some information (rtt displayed as 
> --,---) in the 2nd hop
> b - is it normal that in the summary, Path char rtt is 0.222667 ( the 
> last rtt ) but not 1.330503 + 0.222667 ( the sum of rtt ) ?
> another way to say it : isn't strange to see overall rtt being recorded 
> as the smallest rtt gotten ?
  

Comments in-line:

> here is the command and output :
> <<
> # ./pchar  -v filer85
> pchar to filer85 (192.168.10.234) using UDP/IPv4
> Using raw socket input
> Packet size increments from 32 to 1500 by 32
> 46 test(s) per repetition
> 32 repetition(s) per hop
>  0: 172.31.240.253 (bsun18)
>     Partial loss:      0 / 1472 (0%)
>     Partial char:      rtt = 1.330503 ms, (b = 0.000154 ms/B), r2 = 
> 0.909658
>                        stddev rtt = 0.005918, stddev b = 0.000007
>     Partial queueing:  avg = 0.000119 ms (773 bytes)
>     Hop char:          rtt = 1.330503 ms, bw = 51893.838057 Kbps
>     Hop queueing:      avg = 0.000119 ms (773 bytes)
>  1: 172.31.250.253 (172.31.250.253)
>     Partial loss:      0 / 1472 (0%)
>     Partial char:      rtt = 0.222667 ms, (b = 0.000067 ms/B), r2 = 
> 0.904762
>                        stddev rtt = 0.002636, stddev b = 0.000003
>     Partial queueing:  avg = 0.000102 ms (773 bytes)
>     Hop char:          rtt = --.--- ms, bw = --.--- Kbps
>     Hop queueing:      avg = -0.000017 ms (0 bytes)
>  2: 192.168.10.234 (filer85)
  

A funny thing happened here in that the RTT along two hops was computed
to be larger than the RTT for just the first hop by itself.  (In other 
words, 0.222667 > 1.330503.)  A common cause for this is that the 
router at hop #1 (172.31.250.253) takes longer to generate an ICMP 
time-exceeded message than it does to forward a packet.

In the normal case, we compute the RTT for the second hop by taking the 
RTT for the first two hops and subtracting the RTT for the first hop 
alone.  If we tried that here, we'd get a negative number (0.222667 - 
1.330503 = -1.107836).  Clearly this makes no sense.  At one point, 
pchar used to print the negative RTT, but I got too many emails asking 
what a negative RTT meant.  So I changed it to just print the "--.---" 
string instead.

>     Path length:       2 hops
>     Path char:         rtt = 0.222667 ms r2 = 0.904762
>     Path bottleneck:   51893.838057 Kbps
>     Path pipe:         1444 bytes
>     Path queueing:     average = 0.000102 ms (773 bytes)
>     Start time:        Tue Nov 19 16:53:51 2002
>     End time:          Tue Nov 19 17:06:36 2002
>  >>
  

The RTT for the entire path is equal to the partial RTT for two hops, 
because this is a two-hop path.  The fact that this is smaller than the 
RTT for the first hop doesn't really matter (again, remember that many 
routers can forward packets faster than they can generate ICMP 
messages).

In fact, the path RTT is always going to be equal to the partial path
RTT from the source to the last hop (except for some pathological cases
where we cannot compute this value).

Hope this helps!

Bruce.
>>