[CS header map]

CS Web: Hitch Hiker's Guide: External: Appliance: Network Administration: Network Configuration: TIP_587

Etherchannel: Definition, Support and Usage

Publication Date: October 1998

Introduction

Etherchannel (also referred to as "Fast Etherchannel" or simply "trunking") is a scheme to trunk together multiple 10/100 Mb/s Ethernet links to form a single, fat pipe in order to:

Etherchannel was originally proposed by Sun and picked up by Cisco. Cisco and Sun (and host of switch vendors) currently support Etherchannel in their products. Etherchannel is typically used in LAN backbones for switch-switch, router-switch or server-switch/router connectivity.

In what follows, the terms "Etherchannel" and "trunking" are used interchangeably.

Why Etherchannel?

What is Etherchannel?

As mentioned above, the basic idea in Etherchannel is really simple. The idea is to take some (small number) of Ethernet links and trunk them together. Most switches/routers provide trunking of 2 to 4 links. This trunk looks like a single network pipe to the upper layers (and is administered as a single network pipe). Load balancing is achieved by distributing traffic out onto the physical links. The distribution of outgoing traffic over the links forming the trunk is described below. Furthermore, a trunk with N links can recover from failure of any (N - 1) links by distributing traffic over the remaining links.

Basically, there are two parts to Etherchannel. The first deals with how the links are configured to form a trunk. The second deals with how traffic flowing out on the trunk is load-balanced over the physical links.

Trunk configuration

There are two schemes of configuring trunks:

  1. Static configuration of trunks: This is the simpler of the two schemes and is what all vendors currently ship. In this scheme, on a switch/router/server, we statically trunk together the Ethernet links at boot/reset time.

  2. Dynamic configuration of trunks: A much more complex scheme where a greater deal of flexibility is provided in configuring up the trunk. Cisco is in the process of implementing a (proprietary) protocol called PAgP (Port Aggregation Protocol) to do this.

At this stage, we only intend to support the static trunking configuration scheme. That enables us to get most of the functionality we need and avoids the work and complexity of hoisting yet another protocol into FASware.

Load-balancing of traffic on the trunk

To load-balance the traffic on the trunk, we need to distribute outgoing frames onto the physical links. Distributing traffic on the trunk onto the physical links is done by simply hashing fields of the MAC header onto one of the physical links. We hash the <src MAC addr, dst MAC addr> to one of the physical links. Most switches implement the hash function with simple XOR'ing of the  (6 + 6 =) 12 bytes. The only requirement on the hash function (apart from simplicity) is that it should map a given <src, dst> pair to the same physical link to ensure that packets between the same <src, dst> pair of machines do not get delivered out of order. Strictly speaking, since the only protocol we're interested in supporting is IP, ordering ought not be important. But in practice, there are some (rare) IP implementations that:

  1. get confused if IP frags arrive out of order (old firmware on diskless Sun clients, old linux clients are two examples that come to mind)
  2. result in out-of-order delivery of IP frags leading to inefficiency in reassembly.

This makes ordering of IP fragments important in practice

Another example of the (<src addr, dst addr> -> physical link) function would be to always choose the least-utilized physical link for a <src addr, dst addr> tuple that we don't have a mapping for. We find the least-utilized physical link and enter it in our mapping table to map the <src addr, dst addr> tuple to that link. There are yet other variations possible.

Handling physical link failures

When a physical link goes down, we mark that link as down and flush our (<src addr, dst addr> -> physical link) mapping tables. As frames continue to flow, we populate our mapping tables again, skipping the link(s) that have been marked as down. This leads to a small window where IP fragments can get delivered out of order, but it shouldn't matter in practice.

Etherchannel support in FASware

It turns out that Etherchannel support on the filer is actually simpler than what I've described above, because of "fastpath" sends. NFS, CIFS and HTTP use "fastpath" sends. With fastpath, the interface on which the request came in and the MAC address of the caller is saved on input. The reply is sent out on the same link using the saved (src) MAC address, except that it is now the dst MAC address. This avoids a route lookup and an arp lookup. If the filer is connected to a switch, the switch will distribute NFS (or CIFS or HTTP) requests as they flow into the filer. Since the filer sends the reply out on the same physical interface as the one on which the request was received, there's no need to distribute the responses out to the physical links on the trunk at all. So, basically the filer can load-balance over the links of the trunk without having to de-multiplex the traffic out. Non-fastpath traffic and broadcasts are always sent out on one of the links of the trunk. One of the physical links in the trunk is marked as the "primary link" of the trunk. Non-fastpath frames, broadcast frames, etc. will be sent on this "primary link."s

Conceptually, the trunking layer resides right above the Ethernet drivers. Any number of 10/100BaseT links can be trunked together on the filer. When the links are trunked together, they are all assigned the same MAC address. When the trunk is ifconfig'ed up, each of the interfaces is ifconfig'ed up in turn. A route (for this subnet) is added for the "primary" link. When a link goes down, the driver makes a callback to the trunking layer. The trunking layer "marks" the interface as down. If the interface that failed was the "primary" link on the trunk, then we force another physical link to be the "primary" link of the trunk.

UI changes

The UI changes are restricted to the addition of one new command and a (rather obvious) change to ifconfig. A new command called vif has been added with the following syntax (see vif in Data ONTAP 5.2 Manual Pages or other man pages):

	vif create <interface 0> <interface 1> ... <interface n>

where the create directive is followed by a list of interfaces following the trunk. The vif command outputs the name of the trunk. We use the trunk name as input to ifconfig and the other vif directives. For example:

    filer> vif create e7a e7b e7c e7d
    Created virtual interface vif0

Here, trunk vif0 is formed out of e7a through e7d.

In another example, we see that a trunk can be formed using any combination of Ethernet interfaces and that they do not all have to be on the same quad card (which is a restriction when using trunking on Sun machines):

    filer> vif create e0 e7a e6b e8
    Created virtual interface vif1

The other directives to the vif command are destroy, stat and createname.

The syntax for vif destroy is:

    vif destroy <trunk name>

For example, this command:

    filer> vif destroy vif0

destroys trunk vif0. (Note: a trunk can only be destroyed if it has been ifconfig'ed down.)

The syntax for vif stat is:

    vif stat <trunk name> [interval in seconds]

The interval is optional and defaults to 1 second if not specified. For example, the command:

    filer> vif stat vif0

would show packets in and packets out for each of the physical interfaces composing trunk vif0 every second. Here's the example of vif stat output:

filer> vif stat vif0

Virtual interface (trunk) vif0

    e5d                  e5c           e5b             e5a

In      Out         In      Out    In      Out      In      Out

8637076 47801540    158     159    7023083 38300325 8477195 47223431

1617    9588        0       0      634     3708     919     5400

1009    5928        0       0      925     5407     1246    7380

1269    7506        0       0      862     5040     1302    7710

1293    7632        0       0      761     4416     964     5676

920     5388        0       0      721     4188     981     5784

1098    6462        0       0      988     5772     1003    5898

2212    13176       0       0      769     4500     1216    7185

1315    7776        0       0      743     4320     530     3108

Finally, the vif createname directive allows you to create a trunk non-interactively. This is useful when creating a trunk from the /etc/rc file, for example. The syntax for this is:

    vif createname <vif name> <interface 0> <interface 1> ... <interface n>

where <vif name> is supplied and must be unique (there must be no other vif's of the same name). This is followed by the list of interfaces composing the trunk, as name in the vif create directive.

The last thing we need to do is use the ifconfig command to configure the interface in the /etc/rc file. For example, this entry:

    ifconfig vif0 192.192.192.100 up

configures up the trunk vif0. This entry:

    ifconfig vif0 down

configures down the trunk vif0.

SNMP changes and impact on FilerView

Both FilerView and SNMP will be minimally impacted by Etherchannel.