Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confd: fails removing veth pair after a multple reconfigurations #658

Closed
axkar opened this issue Sep 26, 2024 · 2 comments
Closed

confd: fails removing veth pair after a multple reconfigurations #658

axkar opened this issue Sep 26, 2024 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@axkar
Copy link
Collaborator

axkar commented Sep 26, 2024

Current Behavior

The NETCONF request provided creates a bridge br-X with the port ethX (mapped to a specific port on the target). In the example below, the test initializes the environment and attaches the target, then configures the bridge and its associated interface:

with infamy.Test() as test:
    with test.step("Initialize"):
        env  = infamy.Env()
        target = env.attach("target", "mgmt")

        _, eth_X = env.ltop.xlate("target", "ethX")
        br_X = "br-X"

    with test.step("Configure bridge brX and associated interfaces"):
        target.put_config_dict("ietf-interfaces", {
        "interfaces": {
            "interface": [
                {
                    "name": br_X,
                    "type": "infix-if-type:bridge",
                    "enabled": True
                },
                {
                    "name": eth_X,
                    "type": "infix-if-type:ethernet",
                    "enabled": True,
                    "infix-interfaces:bridge-port": {
                        "bridge": br_X
                    }
                }
            ]
        }
    })

By looking at the logs on the target we see that confd crashes:

Sep 26 16:13:21 test-00-01-00 dagger[3255]: Aborting: /run/net/2/action/init/e3/10-ethtool-aneg.sh failed with exitcode 75
Sep 26 16:13:21 test-00-01-00 dagger[3255]: Abandoned generation 2
Sep 26 16:13:21 test-00-01-00 confd[3255]: Failed to apply interface configuration
Sep 26 16:13:21 test-00-01-00 confd[3255]: Oups, error detected in SR_EV_DONE
Sep 26 16:13:21 test-00-01-00 confd[3255]: failed sr_subscription_process_events(), ret:7
Sep 26 16:13:21 test-00-01-00 finit[1]: Stopping netopeer[3771], sending SIGTERM ...
Sep 26 16:13:21 test-00-01-00 finit[1]: Stopping statd[3716], sending SIGTERM ...
Sep 26 16:13:21 test-00-01-00 finit[1]: Stopping rousette[3801], sending SIGTERM ...
Sep 26 16:13:21 test-00-01-00 finit[1]: Service confd[3255] died, restarting in 2000 msec (1/10)
Sep 26 16:13:21 test-00-01-00 finit[1]: Starting confd[4462]

The issue stems from the fact that, although the request specifies a valid Ethernet type (ethernet) for the ethX interface, the target system expects a different type, specifically etherlike. However, even though there is a type mismatch, this should not cause the confd process to crash.

The crash indicates a bug or a problem with the error handling mechanism in confd.

Expected Behavior

Ideally, the system should reject the configuration and provide a meaningful error message, rather than crashing entirely.

Steps To Reproduce

No response

Additional information

No response

@axkar axkar added bug Something isn't working triage Pending investigation & classification (CCB) labels Sep 26, 2024
@axkar axkar self-assigned this Sep 26, 2024
@axkar axkar changed the title Confd Crashing after Sending Invalid Interface Type Confd Crashing after an Interface Type Missmatch Sep 26, 2024
@axkar axkar changed the title Confd Crashing after an Interface Type Missmatch Confd Crashing after an Ethernet Interface Type Missmatch Sep 26, 2024
@troglobit
Copy link
Contributor

troglobit commented Sep 27, 2024

Nice finding, but not a blocker for v24.09.0. Also, confd doesn't technically crash -- it fails to validate input in SR_EV_CHANGE and lets the invalid configuration propagate to SR_EV_DONE (where you're really allowed to fail in sysrepo terms). When we encounter an error in this state confd (or rather sysrepo-plugind) now calls exit() explicitly to let the system try to recover by restarting everything.

I suggest we change the title to: "confd fails input validation of interface type".

@axkar axkar changed the title Confd Crashing after an Ethernet Interface Type Missmatch Confd Fails Input Validation of Interface Type Sep 27, 2024
@troglobit
Copy link
Contributor

Core team has continued discussing this issue, it has now evolved into a blocker issue for v24.09.

Root cause, reconfiguring the system multiple times after initially adding a VETH pair makes it impossible to remove the VETH pair.

@troglobit troglobit changed the title Confd Fails Input Validation of Interface Type confd: fails removing veth pair after a multple reconfurations at runtime Sep 27, 2024
@troglobit troglobit assigned troglobit and unassigned axkar Sep 27, 2024
@troglobit troglobit added this to the Infix v24.09 milestone Sep 27, 2024
@troglobit troglobit removed the triage Pending investigation & classification (CCB) label Sep 27, 2024
@troglobit troglobit changed the title confd: fails removing veth pair after a multple reconfurations at runtime confd: fails removing veth pair after a multple reconfigurations Sep 27, 2024
troglobit added a commit that referenced this issue Sep 27, 2024
Verify a VETH pair can be removed after a couple of dummy operations to
step the dagger generation past the initial where the pair is created.

NOTE: Infamy currenly lacks support for removing chunks of configuraion
      e.g., a dut.del_config_dict(), or similar, and delete_xpath() is
      not valid for configurations with dependencies like VETH pairs.

Issue #658

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

2 participants