Hello

Thank you to everyone who replied on and off list.

I believe I have a good understanding of ifgrp configuration and the purpose of the various distribution functions.  In this case I'm asking about the specifics of the IP distribution function: how the hash is actually calculated with a given set of source and destination IP addresses and the result used to select the egress interface.  As John suggests, this may be a case where only Netapp can answer the question.

Quoting from TR-4847 which Peter linked to (emphasis mine):

IP: Second-best load distribution method, since the IP addresses of both sender
(LIF) and client are used to deterministically select the particular physical link that a
packet traverses. Although deterministic in the selection of a port, the balancing is
performed using an advanced hash function. This has been found to work under a
wide variety of circumstances, but particular selections of IP addresses might still
lead to unequal load distribution. 

The only information I can find about the 'advanced hash function' is in an old 7-mode KB which states (emphasis mine):

As of Data ONTAP 7.3.2, a multimode or LACP ifgrp/vif uses an implementation of a "SuperFastHash", utilizing the last 16 bits of the source and destination IP addresses (-b ip), the last 16 bits of the source and destination MAC addresses (-b mac), or the last 16 bits of the source and destination IP addresses in combination with the source and destination TCP port (-b port).

The output of the algorithm results in a far more dynamic, more balanced distribution than the algorithm used in versions of Data ONTAP prior to 7.3.2.  The result is still the same, however, in that each TCP stream will associate with only one interface, allowing for only one port's worth of bandwidth per TCP stream.

Data ONTAP releases prior to 7.3.2:

Documentation  prior to 7.3.2 states this as the formula:

((source_address XOR destination_address) % number_of_links)

When I was asked to look into a SnapMirror traffic imbalance, I started by calculating the IP hash using the 'standard' load balancing formula you would expect for source and destination IP. However, the results indicated different behaviour than what we are seeing.  After doing more research today, I ran across the old KB quoted above and also this Github page.  However, it looks like the code was created to replicate empirical data rather than being the actual SuperFastHash algorithm used by Netapp.
 
Why do I care?
Well, mostly just academic curiosity about the 'better mouse trap' that Netapp built.  The  imbalance we are seeing isn't causing any issues. It was just one of those curious things that we investigated to understand why it was happening.  Also, it will be much easier for me  to tweak the intercluster LIF IPs to get the desired traffic balancing in the short term than waiting for a maintenance window to tear down the data ifgrps and rebuild them with the 'Port' distribution function.  (We will rebuild them eventually.)

Thanks again.  I hope someone can shed a bit more light on the 'SuperFastHash' or whatever has replaced it in ONTAP 9.

Best wishes
Stephen

On Tue, 4 Feb 2025 at 20:06, Peter Tas <Peter.Tas@neoria.be> wrote:

Neoria - Public


Hello Stephen,

This document, although already 12y old, is still a good reference.

https://www.netapp.com/media/19900-tr-4847.pdf

 

Check pp7 and the following page(s) regarding load balancing.
“Port” distri function is generally known to be the preferred one when configuring an ifgrp, but it may also depend on the brand/type of switch stack to which you connected the nics of the controllers.

 

Good luck & best regards,

 

Peter Tas.

 

 

 


Neoria - Public

From: John Stoffel <john@stoffel.org>
Date: Tuesday, 4 February 2025 at 18:29
To: Stephen Stocke <scstocke@gmail.com>
Cc: toasters@teaparty.net <toasters@teaparty.net>
Subject: Re: Igfrp IP Distribution Function

>>>>> "Stephen" == Stephen Stocke <scstocke@gmail.com> writes:

> Greetings fellow toaster admins I hope someone can shed some light
> on the IP distribution function for lacp ifgrps.

You need to give more information on your setup, especially what kind
of switches you're using and how they're configured. 

> We have a four port, multi mode lacp ifgrp, a0a, using interfaces
> e2a-e2d. We observe SnapMirror traffic egressing port e2c.

So?  Why do you care?

> The XOR of the last two bits of the source and destination IPs are
> either x0 or x3 so I am expecting traffic to egress either the first
> or fourth port in the ifgrp.

> Assuming e2a is port 0, e2b is port 1, etc., I would expect traffic
> to egress either e2a (0x0) or e2d (0x3).

Share your config (cli output) so we can look at it.


> What am I missing? I can’t find any details about the actual hashing
> function for IP distribution or port member indexing in an ifgrp to
> confirm my assumptions.

> Both clusters are a single A250 HA pair running 9.15.1.

Are you seeing performance problems?  Are you seeing that your traffic
isn't being balanced across all your links? 

I guess I really don't understand the problem you're trying to solve,
unless oyu're just looking for info on why it works this way, which
might really be a Netapp only answer. 

_______________________________________________
toasters mailing list -- toasters@lists.teaparty.net
To unsubscribe send an email to toasters-leave@lists.teaparty.net