Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frr: Miscellaneous fixes in bgp and zebra #6177

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sudhanshukumar22
Copy link
Contributor

@sudhanshukumar22 sudhanshukumar22 commented Dec 10, 2020

Signed-off-by: sudhanshukumar22 sudhanshu.kumar@broadcom.com

- Why I did it
This contains the patches for the various fixes done in FRR bgp and zebra code. This also contains some patches which are already present in FRR community, but not yet present in sonic community. The fixes have been tested in Broadcom hardware.
Each patch has a commit description which describes the changes done for it.
- How I did it
The fixes have been done during BGP protocol testing.
Change description:

  1. src/sonic-frr/Makefile
    this contains changes to remove temporary branches with name stg_temp and $(FRR_BRANCH). This ensures that if any earlier failed run leads to existence of these branches, they will be deleted so that new run can proceed fine.

  2. src/sonic-frr/patch/0012-BGP-keepalive-timer-wasn-t-reset-in.patch
    BGP keepalive timer wasn't reset in case of duplicate connection. Identified the fix and found the similar FRR PR. This pulls the
    following fixes from FRR community 5429, 5368, and 5446.

  3. src/sonic-frr/patch/0013-Double-free-of-transitive-path-attribut-was-causing-.patch
    Double free of transitive path attribut was causing the crash. This pulls fix 5436 from FRR community.

  4. src/sonic-frr/patch/0014-Link-local-scope-was-not-set-while.patch
    Link local scope was not set while binding socket with local address causing socket errors for bgp ipv6 link local
    neighbors.
    The corresponding pull request in FRR community is bgpd: BGP session not established for ipv6 link local address with vrf config FRRouting/frr#7434

  5. src/sonic-frr/patch/0015-Holdtime-and-keepalive-parameters-w.patch
    Holdtime and keepalive parameters weren't copied from peer-group to peer-group members. Fixed the issue by copying
    holdtime and keepalive parameters from peer-group to its members.
    The corresponding pull request is Bgp peer group issue FRRouting/frr#7435

  6. src/sonic-frr/patch/0016-Aggregate-member-route-was-enqueued.patch
    Aggregate member route was enqueued for
    recalculation while bgp instance was deleted. As part of aggregate member
    route deletion, the aggregate route is reinstalled with self-peer as source,
    but self-peer is already removed. Assert() for null peer pointer is path
    attribute aborts bgp.
    Fix :
    When bgp instance is in the state of deletion, or self-peer is null,
    avoid installation of aggregate-prefix as part of aggregate member
    delete.
    When all of the aggregate members are removed, the aggregate prefix is
    automatically removed from zebra.
    The pull request is https://github.com/FRRouting/frr/pull/7433/files

  7. src/sonic-frr/patch/0018-When-user-is-config-connect-timer-i.patch
    When user is config connect timer, it doesn't reflect
    immediately. It reflect when next time neighbor is tried to reconnect.

Fix: Code is change to update the connect timer immediately if BGP
session is not in establish state. The corresponding pull request in FRR community is
FRRouting/frr#7449

  1. src/sonic-frr/patch/0019-when-vrf-add-is-received-add-Vrf-na.patch
    when vrf add is received, add Vrf-name to the interface
    database. This is needed while binding the VRF interface to the BGP socket. This defect was found when password setting on the BGP socket was not honored after reboot. In this case, there were 2 BGP sockets, one on default VRF and another on user VRF. Password was configured on socket with default VRF (not on BGP socket with user VRF) and system was rebooted. After reboot, the BGP socket on user VRF came up, but was not binded to user-VRF since the BGP socket came up before receiving new VRF create message from Zebra. Hence, there were 2 sockets on default VRF, one with password and another without password. BGP neighborship was established on default VRF using the 2nd socket even though it should not form because of password setting. The corresponding pull request is
    bgpd: BGP neighbor password change doesn't take effect with a a particular config on reboot FRRouting/frr#7432

  2. src/sonic-frr/patch/0032-Remove-the-assert-since-in-the-case.patch
    Remove the assert since in the case where
    l3vni was in the oper down state, it is possoble that the function return len
    as zero. Asserts are already present in the called function, so no need to
    have assert here. So removing the assert at this location.

  3. src/sonic-frr/patch/0033-While-putting-the-route-for-route-s.patch
    While putting the route for route
    selection, it is checking for flag in destination. Which looks like didn't
    got cleaned correctly, resetting the flag so that next time route calculation
    can be triggered in the Zebra.

  4. src/sonic-frr/patch/0034-zebra-vrf-delete-issue.patch
    Unable to access the vtysh and all FRR CLIs are timing OUT after config reload two
    times.
    Due to zebra starting during/prior to configuration being replayed,
    there is huge churn of stale vrf delete followed by new vrf add. This
    can cause timing race condition where vrf delete could be missed and
    further same vrf add would get rejected instead of treating last arrived
    vrf add as update.

Treat vrf add for existing vrf as update.
Implicitly disable this VRF to cleanup routes and other functions as part of vrf disable.
Update vrf_id for the vrf and update vrf_id tree.
Re-enable VRF so that all routes are freshly installed.

Above 3 steps are mandatory since it can happen that with config reload
stale routes which are installed in vrf-1 table might contain routes from
older vrf-0 table which might have got deleted due to missing vrf-0 in new configuration.
FRR community pull request: FRRouting/frr#7508

  1. src/sonic-frr/patch/0036-Updated-the-comparison-functions-appropriately.patch
    any configuration for next-hop-self change will not take
    effect if this flag is not reset at bgpd/bgp_route.c:1936

  2. src/sonic-frr/patch/0037-added-a-new-line-for-showing-established-neighbors-c.patch
    added a new line for showing established neighbors count
    This enhances the show bgp summary command to display the established neighbors count.

- How to verify it
We can verify it using BGP protocol testing.
- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@ben-gale
Copy link
Collaborator

@pavel-shirshov - pls review - looking to get these changes in before the end of the year.

@ben-gale
Copy link
Collaborator

@pavel-shirshov , @lguohan - This one has been inactive for a week - how do we move towards merge pls?

@lguohan
Copy link
Collaborator

lguohan commented Jan 6, 2021

frr has been upgraded to 7.5.

@ben-gale
Copy link
Collaborator

ben-gale commented Jan 6, 2021

frr has been upgraded to 7.5.

Yes, the team is looking at if/how this affects this PR.

@sudhanshukumar22
Copy link
Contributor Author

Hi all,
can you please suggest why Broadcom build shows error. I see from the logs that sonic-broadcom.bin is successfully generated and there is no other errors in the log.

@ben-gale
Copy link
Collaborator

ben-gale commented Jan 11, 2021 via email

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
@sudhanshukumar22
Copy link
Contributor Author

Hi all,
There are no NTP changes in my defect. can anyone suggest why ntp testcase is failing here ?

@ben-gale
Copy link
Collaborator

retest vsimage please

@sudhanshukumar22
Copy link
Contributor Author

I am closing this PR because all patches in this PR are merged to frr-master branch, except FRRouting/frr#7434 and FRRouting/frr#7432. I am facing issues with topotest in these PRs. In FRRouting/frr#7434, the link-local testcase is not passing. In FRRouting/frr#7432, one existing testcase is crashing. I am trying to fix them.

@sudhanshukumar22
Copy link
Contributor Author

Reopening this PR as we need to keep it till the sonic moves frr to latest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants