[libteam]: Reimplement Warm-Reboot procedure #3016

pavel-shirshov · 2019-06-14T21:53:01Z

New implementation of teamd support of Warm-Reboot procedure

Port of #2999

During the manual testing of the previous Warm-Reboot procedure implementation for teamd we found, that teamd restores state incorrectly, if one of the ports was put in OPER DOWN state during the procedure.
To fix that I redesigned the procedure completely:

When we prepare the system for the WR procedure, we save LACP PDU for every LAG member port (as before), and also additional information about current LAG:

number of LAG members
interface names of the LAG members
operational states of the LAG members

When we start the system in WR state, we read the saved LAG information, and this information allows us to restore the state correctly.

if LAG state before the reboot was down, we disable the WR mode immediately, we have to start in the normal mode
if LAG state before the reboot was up, we start with enabled LAG interface, to don't disrupt the dataplane and start restoring state of LAG members
when SONiC adds LAG interfaces, teamd runs lacp_update_carrier() function. This function is used to calculate LAG interface operational state. We keep the state up, while we're in WR mode. Every time when lacp_update_carrier() is executed, we check operational state of LAG member interfaces. If it's up, we read the LACP PDU from files, if it's not up we wait.
After we execute lacp_update_carrier() once, we start timer. If we weren't able to restore the state for more than 3 seconds, we stop WR mode and start working as usual.
As soon as we read all LAG member interfaces state, we can disable WR-mode and run teamd as usual

The WR start logic was completely moved to lacp_update_carrier().
I've added a lot of debug messages for WR mode, which will allow us to find issues easily.

I've rearranged libteam patches in the series, to make WR patch last. It will allow us to change WR behaviour more easily

pavel-shirshov added 4 commits June 7, 2019 18:44

Update sonic-quagga submodule

ee4613c

Merge branch 'master' of https://github.com/Azure/sonic-buildimage

b12e064

Merge branch 'master' of https://github.com/Azure/sonic-buildimage

b562508

[libteam]: Reimplement Warm-Reboot procedure

3fca83e

pavel-shirshov added Bug 🐛 Enhancement ➕ labels Jun 14, 2019

pavel-shirshov requested a review from lguohan June 14, 2019 21:53

lguohan approved these changes Jun 14, 2019

View reviewed changes

pavel-shirshov merged commit 466334a into sonic-net:master Jun 15, 2019

pavel-shirshov deleted the pavelsh/libteam_master branch June 15, 2019 00:27

pushpraj mentioned this pull request Oct 30, 2023

HLD: DHCPv4 - Specify dhcp relay's Gateway explicitly with Primary address. sonic-net/SONiC#1470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libteam]: Reimplement Warm-Reboot procedure #3016

[libteam]: Reimplement Warm-Reboot procedure #3016

pavel-shirshov commented Jun 14, 2019

[libteam]: Reimplement Warm-Reboot procedure #3016

[libteam]: Reimplement Warm-Reboot procedure #3016

Conversation

pavel-shirshov commented Jun 14, 2019