Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: fix kernel-route's deletion on vrf #5553

Conversation

slankdev
Copy link
Contributor

zebra can catch the kernel's route deletion by netlink.
but current FRR can't delete kernel-route on vrf(l3mdev)
when kernel operator delete the route on out-side of FRR.
It looks problem about kernel-route deletion.

This problem is caused around _nexthop_cmp_no_labels(nh1,nh2)
that checks the each nexthop's member 'vrf_id'.
And _nexthop_cmp_no_labels's caller doesn't set the vrf_id
of nexthop structure. This commit fix that case.

Signed-off-by: Hiroki Shirokura slank.dev@gmail.com

@slankdev
Copy link
Contributor Author

slankdev commented Dec 18, 2019

Before this patch, frr can't work fine about kernel-route deletion on vrf.

ip link add vrf0 type vrf table 10
ip link set vrf0 up
ip link add dum0 type dummy
ip link set dum0  vrf vrf0
ip link set dum0 up

/usr/lib/frr/frrinit.sh start
ip route add 1.1.1.1/32 dev dum0 vrf vrf0

ip route list vrf vrf0
1.1.1.1 dev dum0 scope link

vtysh -c 'sh ip route vrf vrf0'
VRF vrf0:
K>* 1.1.1.1/32 [0/0] is directly connected, dum0, 00:00:06

ip route 1.1.1.1/32 dev dum0 vrv vrf0
ip route list vrf vrf0
// nothing output

vtysh -c 'sh ip route vrf vrf0'
VRF vrf0:
K>* 1.1.1.1/32 [0/0] is directly connected, dum0, 00:00:10

with that log (debug zebra rib detailed)

2019/12/17 05:20:31 ZEBRA: rib_delete: 4:1.1.1.1/32: via 0.0.0.0 ifindex 3 type 1 doesn't exist in rib

@LabN-CI
Copy link
Collaborator

LabN-CI commented Dec 18, 2019

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/5553 7f76b26
Date 12/18/2019
Start 04:50:23
Finish 05:16:11
Run-Time 25:48
Total 1815
Pass 1815
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2019-12-18-04:50:23.txt
Log autoscript-2019-12-18-04:51:18.log.bz2
Memory 434 434 360

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-10147/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

<TITLE>clang_check</TITLE>

clang_check

Copy link
Member

@donaldsharp donaldsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sworleys / @mjs what is the proper way to look up the vrf from the ifindex here?

@@ -813,6 +813,7 @@ static int netlink_route_change_read_unicast(struct nlmsghdr *h, ns_id_t ns_id,
nh.bh_type = bh_type;
}
nh.ifindex = index;
nh.vrf_id = vrf_id;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index( nh.ifindex ) should dictate the nh's vrf. This will break in a route leaking situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I see what you mean.

yes this code just needs to be changed to call parse_nexthop_unicast()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That handles the vrf_id by looking up the ifindex and using the interface's if it finds it, but falls back to the one set here if it doesn't.

Copy link
Member

@sworleys sworleys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete branch needs to be changed to call parse_nexthop_unicast() exactly as the create branch does. It does some handling for the vrf_id there that we also need to do in this path and might as well combine them with a common API.

@slankdev
Copy link
Contributor Author

I'm sorry to late the reaction for your great suggestion... :( (my work was really busy..)
Now, I fix my patch with your direction!

@slankdev slankdev force-pushed the slankdev-fix-kernel-route-deletion-on-vrf branch from 7f76b26 to 7350bf5 Compare December 23, 2019 06:21
Copy link

@polychaeta polychaeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution to FRR!

Click for style suggestions

To apply these suggestions:

curl -s https://gist.githubusercontent.com/polychaeta/1282d1b54ecfc2ee82772f7d77f8447d/raw/5395961336f9c7e7e688551ddb86fff60bf23243/cr_5553_1577082100.diff | git apply

diff --git a/zebra/rt_netlink.c b/zebra/rt_netlink.c
index 1a041492c..29a341abb 100644
--- a/zebra/rt_netlink.c
+++ b/zebra/rt_netlink.c
@@ -789,8 +789,8 @@ static int netlink_route_change_read_unicast(struct nlmsghdr *h, ns_id_t ns_id,
 				struct nexthop nh;
 
 				nh = parse_nexthop_unicast(
-					   ns_id, rtm, tb, bh_type, index, prefsrc,
-					   gate, afi, vrf_id);
+					ns_id, rtm, tb, bh_type, index, prefsrc,
+					gate, afi, vrf_id);
 				rib_delete(afi, SAFI_UNICAST, vrf_id, proto, 0,
 					   flags, &p, &src_p, &nh, 0, table,
 					   metric, distance, true);

If you are a new contributor to FRR, please see our contributing guidelines.

zebra can catch the kernel's route deletion by netlink.
but current FRR can't delete kernel-route on vrf(l3mdev)
when kernel operator delete the route on out-side of FRR.
It looks problem about kernel-route deletion.

This problem is caused around _nexthop_cmp_no_labels(nh1,nh2)
that checks the each nexthop's member 'vrf_id'.
And _nexthop_cmp_no_labels's caller doesn't set the vrf_id
of nexthop structure. This commit fix that case.

Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
@slankdev slankdev force-pushed the slankdev-fix-kernel-route-deletion-on-vrf branch from 7350bf5 to 760f39d Compare December 23, 2019 06:25
@LabN-CI
Copy link
Collaborator

LabN-CI commented Dec 23, 2019

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/5553 760f39d
Date 12/23/2019
Start 01:50:22
Finish 02:16:06
Run-Time 25:44
Total 1815
Pass 1815
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2019-12-23-01:50:22.txt
Log autoscript-2019-12-23-01:51:18.log.bz2
Memory 427 430 361

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-10224/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Debian 10 amd64 build: Successful with additional warnings

Debian Package lintian failed for Debian 10 amd64 build:
(see full package build log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-10224/artifact/DEB10BUILD/ErrorLog/log_lintian.txt)

W: frr source: pkg-js-tools-test-is-missing
W: frr source: pkg-js-tools-test-is-missing
W: frr-rpki-rtrlib: changelog-file-missing-explicit-entry 6.0-2 -> 7.3-dev-20191223-00-g760f39dc0-0 (missing) -> 7.3-dev-20191223-00-g760f39dc0-0~deb10u1
W: frr-snmp: changelog-file-missing-explicit-entry 6.0-2 -> 7.3-dev-20191223-00-g760f39dc0-0 (missing) -> 7.3-dev-20191223-00-g760f39dc0-0~deb10u1
W: frr-doc: changelog-file-missing-explicit-entry 6.0-2 -> 7.3-dev-20191223-00-g760f39dc0-0 (missing) -> 7.3-dev-20191223-00-g760f39dc0-0~deb10u1
W: frr-pythontools: changelog-file-missing-explicit-entry 6.0-2 -> 7.3-dev-20191223-00-g760f39dc0-0 (missing) -> 7.3-dev-20191223-00-g760f39dc0-0~deb10u1
W: frr: changelog-file-missing-explicit-entry 6.0-2 -> 7.3-dev-20191223-00-g760f39dc0-0 (missing) -> 7.3-dev-20191223-00-g760f39dc0-0~deb10u1
W: frr: spelling-error-in-readme-debian explecitly explicitly
<TITLE>clang_check</TITLE>

clang_check

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-10223/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

<TITLE>clang_check</TITLE>

clang_check

Copy link
Member

@sworleys sworleys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change.

@rzalamena
Copy link
Member

@slankdev thank you for your contribution! This PR fixes a zebra cash when using FPM.

I'd just note that it introduces a few log messages:

2020/01/06 11:45:04 ZEBRA: [EC 4043309093] netlink-dp (NS 0) error: No such file or directory, type=RTM_DELNEXTHOP(105), seq=11, pid=2311070544
2020/01/06 11:45:04 ZEBRA: Extended Error: Carrier for nexthop device is down
2020/01/06 11:45:04 ZEBRA: [EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=13, pid=2311070544
2020/01/06 11:45:04 ZEBRA: [EC 4043309075] Failed to uninstall Nexthop ID (8) from the kernel
2020/01/06 11:45:04 ZEBRA: [EC 4043309074] Failed to install Nexthop ID (15) into the kernel
2020/01/06 11:46:41 ZEBRA: [EC 4043309093] netlink-dp (NS 0) error: No such file or directory, type=RTM_DELNEXTHOP(105), seq=69, pid=2311070544
2020/01/06 11:46:41 ZEBRA: Extended Error: Carrier for nexthop device is down
2020/01/06 11:46:41 ZEBRA: [EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=71, pid=2311070544
2020/01/06 11:46:41 ZEBRA: [EC 4043309093] netlink-dp (NS 0) error: No such file or directory, type=RTM_DELNEXTHOP(105), seq=118, pid=2311070544

To reproduce it just follow the steps I've mentioned in the issue.

I don't think it is necessary to fix it here, but if someone is willing why not.

@sworleys
Copy link
Member

sworleys commented Jan 6, 2020

@slankdev thank you for your contribution! This PR fixes a zebra cash when using FPM.

I'd just note that it introduces a few log messages:

2020/01/06 11:45:04 ZEBRA: [EC 4043309093] netlink-dp (NS 0) error: No such file or directory, type=RTM_DELNEXTHOP(105), seq=11, pid=2311070544
2020/01/06 11:45:04 ZEBRA: Extended Error: Carrier for nexthop device is down
2020/01/06 11:45:04 ZEBRA: [EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=13, pid=2311070544
2020/01/06 11:45:04 ZEBRA: [EC 4043309075] Failed to uninstall Nexthop ID (8) from the kernel
2020/01/06 11:45:04 ZEBRA: [EC 4043309074] Failed to install Nexthop ID (15) into the kernel
2020/01/06 11:46:41 ZEBRA: [EC 4043309093] netlink-dp (NS 0) error: No such file or directory, type=RTM_DELNEXTHOP(105), seq=69, pid=2311070544
2020/01/06 11:46:41 ZEBRA: Extended Error: Carrier for nexthop device is down
2020/01/06 11:46:41 ZEBRA: [EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=71, pid=2311070544
2020/01/06 11:46:41 ZEBRA: [EC 4043309093] netlink-dp (NS 0) error: No such file or directory, type=RTM_DELNEXTHOP(105), seq=118, pid=2311070544

To reproduce it just follow the steps I've mentioned in the issue.

I don't think it is necessary to fix it here, but if someone is willing why not.

Those are due to the route being deleted now and us try to delete the nexthop groups in the kernel we are using for them. I think this should probably be a separate issue (assign it to me). I was under the impression nexthop groups weren't deleted with vrf deletion but it looks like that might not be the case. I will need to investigate this further.

@donaldsharp donaldsharp merged commit c4db327 into FRRouting:master Jan 7, 2020
@slankdev slankdev deleted the slankdev-fix-kernel-route-deletion-on-vrf branch January 7, 2020 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants