Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle dual ToR neighbor miss scenario #2137

Merged

Conversation

theasianpianist
Copy link
Contributor

@theasianpianist theasianpianist commented Feb 8, 2022

What I did

  • If unable to resolve a neighbor on a dual ToR system, neighsyncd will write a neighbor table entry for that neighbor with a zero MAC
  • When orchagent receives a neighbor update with a zero MAC:
    • If the neighbor IP is configured for a specific mux cable port in the MUX_CABLE table in CONFIG_DB, handle the neighbor normally (if active for the port, no action is needed. if standby, a tunnel route is created for the neighbor IP)
    • If the neighbor IP is not configured for a specific port, create a tunnel route for the IP to the peer switch.
      • When these neighbor IPs are eventually resolved, remove the tunnel route and handle the neighbor normally.
  • When creating/initializing a mux cable object, set the internal state to standby to match the constructor behavior.

Test changes

  • Various formatting fixes inside test_mux.py
  • Remove references to deprecated @pytest.yield_fixture
  • Add dual ToR neighbor miss test cases:
    • Test cases and expected results are described in mux_neigh_miss_tests.py. These descriptions are used by the generic test runner test_neighbor_miss function to execute the test actions and verify expected results
    • Various setup fixtures and test info fixtures were added
    • Existing test cases were changed to use these setup fixtures for consistency

Test case descriptions:
Expectations:

  • On dual ToR devices, any unresolved IP in the kernel neighbor table should have a corresponding tunnel route created
  • If an IP is resolved, tunnel route creation depends on cable state (this is the existing/normal muxorch code flow)
  • If kernel entry deleted, always remove tunnel route

For the purposes of this test, there is a distinction made between 'server' IPs and 'neighbor' IPs. Server IPs are IP addresses explicitly configured on a specific mux cable interface in the MUX_CABLE table in config DB. Mux cable IPs (or neighbor IPs) are any other IPs within the VLAN subnet.

Standby Mux Cable IP Unresolved Test Cases:

	   [action] -> [expected result]
	1. netlink fail -> tunnel route created
	2. switch to active -> tunnel route remains
	3. IP resolved -> tunnel route teardown, neighbor added
	4. kernel neighbor deleted -> no tunnel route, no neighbor entry
	
	   [action] -> [expected result]
	1. netlink fail -> tunnel route created
	2. IP resolved -> tunnel route remains
	3. switch to active -> tunnel route teardown, neighbor added
	4. kernel neighbor deleted -> no tunnel route, no neighbor entry
	
	   [action] -> [expected result]
	1. netlink fail -> tunnel route created
	2. kernel neighbor deleted -> tunnel deleted

Active Mux Cable IP Unresolved Test Cases:

	   [action] -> [expected result]
	1. netlink fail -> tunnel route created
	2. switch to standby -> tunnel route remains
	3. IP resolved -> tunnel route remains
	4. kernel neighbor deleted -> no tunnel route, no neighbor entry

	   [action] -> [expected result]
	1. netlink fail -> tunnel route created
	2. IP resolved -> tunnel route teardown, neighbor added
	3. switch to standby -> tunnel route created, no neighbor
	4. kernel neighbor deleted -> no tunnel route, no neighbor entry
		
	   [action] -> [expected result]
	1. netlink fail -> tunnel route created
	2. kernel neighbor deleted -> no tunnel route

Neighbor IPs Unresolved Test Cases:

	   [action] -> [expected result]
	1. netlink fail -> tunnel route created
	2. resolved on standby -> tunnel route remains
	3. switch to active -> tunnel route teardown, neighbor added
	4. kernel neighbor deleted -> no tunnel route, no neighbor entry

	   [action] -> [expected result]
	1. netlink fail -> tunnel route created
	2. resolved on active -> tunnel route teardown, neighbor added
	3. switch to standby -> tunnel route created, no neighbor
	4. kernel neighbor deleted -> no tunnel route, no neighbor entry

	   [action] -> [expected result]
	1. netlink fail -> tunnel route created
	2. kernel neighbor deleted -> tunnel route teardown

Why I did it

How I verified it

❯ sudo py.test -v --imgname=docker-sonic-vs:latest "test_mux.py::TestMuxTunnel"
=================================================================================== test session starts ====================================================================================
platform linux -- Python 3.6.9, pytest-7.0.0, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /home/lawlee/sonic-buildimage/src/sonic-swss/tests
plugins: flaky-3.7.0
collected 26 items

test_mux.py::TestMuxTunnel::test_Tunnel PASSED                                                                                                                                       [  3%]
test_mux.py::TestMuxTunnel::test_Peer PASSED                                                                                                                                         [  7%]
test_mux.py::TestMuxTunnel::test_Neighbor PASSED                                                                                                                                     [ 11%]
test_mux.py::TestMuxTunnel::test_Fdb PASSED                                                                                                                                          [ 15%]
test_mux.py::TestMuxTunnel::test_Route PASSED                                                                                                                                        [ 19%]
test_mux.py::TestMuxTunnel::test_acl PASSED                                                                                                                                          [ 23%]
test_mux.py::TestMuxTunnel::test_mux_metrics PASSED                                                                                                                                  [ 26%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[active->ping_serv->standby->resolve_entry->delete_entry-IPv4] PASSED                                                                  [ 30%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[active->ping_serv->standby->resolve_entry->delete_entry-IPv6] PASSED                                                                  [ 34%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[active->ping_serv->resolve_entry->standby->delete_entry-IPv4] PASSED                                                                  [ 38%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[active->ping_serv->resolve_entry->standby->delete_entry-IPv6] PASSED                                                                  [ 42%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[active->ping_serv->delete_entry-IPv4] PASSED                                                                                          [ 46%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[active->ping_serv->delete_entry-IPv6] PASSED                                                                                          [ 50%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[standby->ping_serv->active->resolve_entry->delete_entry-IPv4] PASSED                                                                  [ 53%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[standby->ping_serv->active->resolve_entry->delete_entry-IPv6] PASSED                                                                  [ 57%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[standby->ping_serv->resolve_entry->active->delete_entry-IPv4] PASSED                                                                  [ 61%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[standby->ping_serv->resolve_entry->active->delete_entry-IPv6] PASSED                                                                  [ 65%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[standby->ping_serv->delete_entry-IPv4] PASSED                                                                                         [ 69%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[standby->ping_serv->delete_entry-IPv6] PASSED                                                                                         [ 73%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[standby->ping_neigh->resolve_entry->active->delete_entry-IPv4] PASSED                                                                 [ 76%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[standby->ping_neigh->resolve_entry->active->delete_entry-IPv6] PASSED                                                                 [ 80%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[active->ping_neigh->resolve_entry->standby->delete_entry-IPv4] PASSED                                                                 [ 84%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[active->ping_neigh->resolve_entry->standby->delete_entry-IPv6] PASSED                                                                 [ 88%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[ping_neigh->delete_entry-IPv4] PASSED                                                                                                 [ 92%]
test_mux.py::TestMuxTunnel::test_neighbor_miss[ping_neigh->delete_entry-IPv6] PASSED                                                                                                 [ 96%]
test_mux.py::TestMuxTunnel::test_neighbor_miss_no_peer PASSED                                                                                                                        [100%]

============================================================================== 26 passed in 207.79s (0:03:27) ==============================================================================

Details if related

- If unable to resolve a neighbor on a dual ToR system, write a neighbor table entry for that neighbor with a zero MAC

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- When receiving a neighbor update with a zero MAC:
    - If the neighbor IP is configured for a specific mux cable port in the MUX_CABLE table in CONFIG_DB, handle the neighbor normally (if active for the port, no action is needed. if standby, a tunnel route is created for the neighbor IP)
    - If the neighbor IP is not configured for a specific port, create a tunnel route for the IP to the peer switch.
        - When these neighbor IPs are eventually resolved, remove the tunnel route and handle the neighbor normally.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
neighsyncd/neighsync.cpp Outdated Show resolved Hide resolved
orchagent/muxorch.cpp Outdated Show resolved Hide resolved
orchagent/muxorch.cpp Outdated Show resolved Hide resolved
orchagent/muxorch.cpp Outdated Show resolved Hide resolved
orchagent/neighorch.cpp Outdated Show resolved Hide resolved
theasianpianist and others added 8 commits February 9, 2022 21:52
- Insert newline before open braces
- Use MacAddress boolean conversion

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- Move standalone tunnel neighbor handling to completely separate code path

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- Add dual ToR neighbor miss test cases:
    - Create generic test runner function that executes test
      steps described in mux_neigh_miss_tests.py
    - Add neighbor miss test case with missing PEER_SWITCH config
- Create setup fixtures for dual ToR DB entries
    - CONFIG_DB: VLAN, MUX_CABLE, PEER_SWITCH
    - APPL_DB: TUNNEL_DECAP_TABLE
- Refactor existing tests to use setup fixtures for consistency
- Create ASIC_DB and APPL_DB verification functions
    - Verify APPL_DB NEIGH_TABLE entry presence/absence
    - Verify ASIC_DB ROUTE_ENTRY and NEIGH_ENTRY entry presence/absence
- Create various fixtures to generate test information

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 14, 2022

This pull request introduces 2 alerts and fixes 1 when merging 193a00d into 4bff5c6 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

fixed alerts:

  • 1 for Unused import

@theasianpianist theasianpianist marked this pull request as ready for review February 14, 2022 22:33
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 14, 2022

This pull request fixes 1 alert when merging dc04459 into 4bff5c6 - view on LGTM.com

fixed alerts:

  • 1 for Unused import

Copy link
Collaborator

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, lets chat on the test changes.

@@ -6,6 +6,7 @@
#include "routeorch.h"
#include "directory.h"
#include "muxorch.h"
#include "observer.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this. Its already in .h file

@@ -17,6 +18,8 @@ extern RouteOrch *gRouteOrch;
extern FgNhgOrch *gFgNhgOrch;
extern Directory<Orch*> gDirectory;

#define MUX_TUNNEL "MuxTunnel0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove. Its unused and also not relevant to neighorch

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 15, 2022

This pull request fixes 1 alert when merging 499475e into 4bff5c6 - view on LGTM.com

fixed alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Feb 15, 2022

This pull request fixes 1 alert when merging 499475e into 4bff5c6 - view on LGTM.com

fixed alerts:

  • 1 for Unused import

@theasianpianist theasianpianist merged commit 081dd01 into sonic-net:202012 Feb 16, 2022
theasianpianist added a commit to theasianpianist/sonic-swss that referenced this pull request Feb 19, 2022
- When orchagent receives a neighbor update with a zero MAC:
    - If the neighbor IP is configured for a specific mux cable port in the MUX_CABLE table in CONFIG_DB, handle the neighbor normally (if active for the port, no action is needed. if standby, a tunnel route is created for the neighbor IP)
    - If the neighbor IP is not configured for a specific port, create a tunnel route for the IP to the peer switch.
        - When these neighbor IPs are eventually resolved, remove the tunnel route and handle the neighbor normally.
- When creating/initializing a mux cable object, set the internal state to standby to match the constructor behavior.

- Various formatting fixes inside test_mux.py
- Remove references to deprecated `@pytest.yield_fixture`
- Add dual ToR neighbor miss test cases:
    - Test cases and expected results are described in `mux_neigh_miss_tests.py`. These descriptions are used by the generic test runner `test_neighbor_miss` function to execute the test actions and verify expected results
    - Various setup fixtures and test info fixtures were added
    - Existing test cases were changed to use these setup fixtures for consistency

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Co-authored-by: Sumukha Tumkur Vani <stumkurv@microsoft.com>
yxieca pushed a commit that referenced this pull request Aug 20, 2022
* Handle dual ToR neighbor miss scenario (#2137)

- When orchagent receives a neighbor update with a zero MAC:
    - If the neighbor IP is configured for a specific mux cable port in the MUX_CABLE table in CONFIG_DB, handle the neighbor normally (if active for the port, no action is needed. if standby, a tunnel route is created for the neighbor IP)
    - If the neighbor IP is not configured for a specific port, create a tunnel route for the IP to the peer switch.
        - When these neighbor IPs are eventually resolved, remove the tunnel route and handle the neighbor normally.
- When creating/initializing a mux cable object, set the internal state to standby to match the constructor behavior.

- Various formatting fixes inside test_mux.py
- Remove references to deprecated `@pytest.yield_fixture`
- Add dual ToR neighbor miss test cases:
    - Test cases and expected results are described in `mux_neigh_miss_tests.py`. These descriptions are used by the generic test runner `test_neighbor_miss` function to execute the test actions and verify expected results
    - Various setup fixtures and test info fixtures were added
    - Existing test cases were changed to use these setup fixtures for consistency

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Co-authored-by: Sumukha Tumkur Vani <stumkurv@microsoft.com>
yxieca pushed a commit that referenced this pull request Aug 20, 2022
* Handle dual ToR neighbor miss scenario (#2137)

- When orchagent receives a neighbor update with a zero MAC:
    - If the neighbor IP is configured for a specific mux cable port in the MUX_CABLE table in CONFIG_DB, handle the neighbor normally (if active for the port, no action is needed. if standby, a tunnel route is created for the neighbor IP)
    - If the neighbor IP is not configured for a specific port, create a tunnel route for the IP to the peer switch.
        - When these neighbor IPs are eventually resolved, remove the tunnel route and handle the neighbor normally.
- When creating/initializing a mux cable object, set the internal state to standby to match the constructor behavior.

- Various formatting fixes inside test_mux.py
- Remove references to deprecated `@pytest.yield_fixture`
- Add dual ToR neighbor miss test cases:
    - Test cases and expected results are described in `mux_neigh_miss_tests.py`. These descriptions are used by the generic test runner `test_neighbor_miss` function to execute the test actions and verify expected results
    - Various setup fixtures and test info fixtures were added
    - Existing test cases were changed to use these setup fixtures for consistency

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Co-authored-by: Sumukha Tumkur Vani <stumkurv@microsoft.com>
theasianpianist added a commit to sonic-net/sonic-utilities that referenced this pull request Aug 30, 2022
When checking for route entry mismatches, ignore mismatched routes where an APPL_DB neighbor entry with a zero MAC is present.

These routes are introduced by this change in SWSS and are expected: sonic-net/sonic-swss#2137

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
theasianpianist added a commit to sonic-net/sonic-utilities that referenced this pull request Aug 30, 2022
When checking for route entry mismatches, ignore mismatched routes where an APPL_DB neighbor entry with a zero MAC is present.

These routes are introduced by this change in SWSS and are expected: sonic-net/sonic-swss#2137

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
theasianpianist added a commit to theasianpianist/sonic-utilities that referenced this pull request Sep 1, 2022
When checking for route entry mismatches, ignore mismatched routes where an APPL_DB neighbor entry with a zero MAC is present.

These routes are introduced by this change in SWSS and are expected: sonic-net/sonic-swss#2137

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
yxieca pushed a commit to sonic-net/sonic-utilities that referenced this pull request Sep 2, 2022
When checking for route entry mismatches, ignore mismatched routes where an APPL_DB neighbor entry with a zero MAC is present.

These routes are introduced by this change in SWSS and are expected: sonic-net/sonic-swss#2137

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
EdenGri pushed a commit to EdenGri/sonic-utilities that referenced this pull request Oct 12, 2022
When checking for route entry mismatches, ignore mismatched routes where an APPL_DB neighbor entry with a zero MAC is present.

These routes are introduced by this change in SWSS and are expected: sonic-net/sonic-swss#2137

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
preetham-singh pushed a commit to preetham-singh/sonic-utilities that referenced this pull request Nov 21, 2022
When checking for route entry mismatches, ignore mismatched routes where an APPL_DB neighbor entry with a zero MAC is present.

These routes are introduced by this change in SWSS and are expected: sonic-net/sonic-swss#2137

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this pull request Aug 3, 2023
When checking for route entry mismatches, ignore mismatched routes where an APPL_DB neighbor entry with a zero MAC is present.

These routes are introduced by this change in SWSS and are expected: sonic-net/sonic-swss#2137

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants