Nay, we would never do that.
IETF shall never — ever, step onto another standardization bodies toes, feet or anything that looks even remotely inviting. Drawing inspiration from Quentin Tarantino, where the story line keeps moving back and forth, and an enormous amount of cerebral energy is spent in keeping track of whats currently happening, i will boldly take you where no man — or person if you will, has ever gone before (now imagine the Starship Enterprise whooshing past).
Service providers were complaining about how the dead link detection for LAGs was waaaaaaaaaay too slow. It would take them 3 seconds at the least, to detect that a link has gone down, before they would stop sending traffic down that offending link (more here). 3 secs of outage is HUGE and clearly unacceptable – especially with all routers, or at least the ones that claim to forward traffic, supporting 50ms of FRR times.
An easy and a viable option before folks was to request IEEE to rework their LACP standard to bring down the protocol timers from 1 second to a few milliseconds. However, what made this nonviable was the urgency with which the service operators needed the extension – a way to detect dead links in a LAG in the O(ms). I am no IEEE expert, and its my humble understanding that things move slower — way too slower, in IEEE than IETF. The work thus naturally gravitated towards IETF.
Within IETF — and you’ll have to squint your eyes really hard — we can use BFD for this purpose. Its after all a very light weight protocol that just does liveliness detection. BFD keeps exchanging packets back and forth over an IP interface and if it doesnt hear from the other end, it brings down all protocols riding on top of that interface with the enthusiasm that most people often find quite alarming (if not disconcerting).
So, BFD did look like a best fit within IETF, to take on this challenge of fixing the issue of removing dead links for the LAGs asap.
Take a giant leap in the time fabric and you’ll find yourself staring at an ongoing effort in the BFD WG where we establish micro BFD sessions over each link (more here) in an attempt to fix the LAG issue.
The simplest and arguably the most elegant way to get BFD to work on LAGs was by tweaking the LACP state machine — wherein a member link of a LAG is NOT brought into the Distributing State till the micro BFD session on that link goes Up. While this looks way too cool and would have resulted in a 1 Page RFC (which i must agree has a certain charm to it), it poses certain challenges. The biggest being that IETF has no control over LACP and any proposal that tweaks LACP’s state machine would have brought out daggers, sabers and other metallic implements that can hurt — really, really hurt.
BFD WG is filled with people who love peace, harmony and detest anything that engenders ill feeling between two biggest Internet related standardization bodies – IEEE and IETF, and it was thus decided that anything that meant stepping onto IEEE’s toes was not cool enough to be considered.
We went back to the drawing boards and really looked at how we could fix the problem.
The basic problem that needs to be fixed is as follows:
You have multiple links in a LAG. There is a link that goes dead. Detect that dead link and remove it from the LAG asap so that no traffic is sent over that link.
We decided to run BFD on each link – called it a micro BFD (uBFD) session. Once the uBFD session is established we keep using that link till the session is alive. As soon as the uBFD session goes down, we need to do something so that that link is NOT used for forwarding. Since we cant remove the link from the LAG (as that functionality is beyond our scope) we decided to do the next best thing. We decided that if the uBFD session goes down, we will remove that link from the “load balancer” that exists on all routers supporting LAGs. More about the proposal here.
A “load balancer” above is a hypothetical construct that controls the member links over which the traffic needs to be load balanced in case of LAGs. Assume there is a LAG comprising of 3 member links. Traffic egressing out of that LAG must hit some HW entity that splits the traffic across these 3 member links. If a member link is removed from that entity the traffic will only be hashed across the remaining two member ports. Manipulating stuff on this HW entity doesnt violate our pious agreement with IEEE as this HW entity is not owned by IEEE.
We can now be more specific. We can say that uBFD should only affect HW entity that manages member links for IP traffic — since BFD is after all an IP protocol and is only verifying IP connectivity. If your HW is capable of separate HW entities for both IPv4 and IPv6, then you could theoretically run multiple uBFD sessions between the same set of end points.
If you have separate HW entities for load splitting for L2 and L3 traffic (nobody does that) then you could theoretically only remove/add member links based on uBFD sessions on the HW entity that manages traffic distribution for IP traffic!
So we now have a solution that fixes the problem cited by the service providers all within the realms of IETF! You run uBFD sessions over each member link. If the uBFD session goes down, you remove the link from the “load balancer” which would effectively mean that no traffic will now be sent over that offending link.
And btw, we arrived at the solution without stepping onto IEEE toes, coz doing that would have hurt. Ouch!
One Response
Great Article. I like the humor mixed with the technical revelations in the post. Very different from the usual content thats found on this website. Looking forward to more posts from the author.