Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase robustness of operational datastore #563

Merged
merged 6 commits into from
Aug 22, 2024
Merged

Conversation

troglobit
Copy link
Contributor

@troglobit troglobit commented Aug 17, 2024

Description

This PR increases the robustness of the Infix yanger helper. Basically, instead of bailing out with sys.exit(1), causing statd to fail and automatically cause a non-working operational datastore, we allow external commands to fail and then return empty objects/arrays to the respective data parsers.

For example, asking the operational datastore for the status of a container before the container has been started properly should not b0rk the datastore, but rather return None information. Same as if listing all containers.

The test framework has also been made slightly more fault tolerant by adding a retry of container commands.

Fixes #558 and #568

Checklist

Tick relevant boxes, this PR is-a or has-a:

  • Bugfix
    • Regression tests
    • ChangeLog updates (for next release)
  • Feature
    • YANG model change => revision updated?
    • Regression tests added?
    • ChangeLog updates (for next release)
    • Documentation added?
  • Code style update (formatting, renaming)
  • Refactoring (please detail in commit messages)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

Simplify and add paragraph on how to select proto from test-sh.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
@troglobit
Copy link
Contributor Author

Ugly intermediate state of this PR, which still shows the general direction where I'd like to take this:

  • make yanger more robust so that it does not break the operational ds
  • add more logging to yanger in cases where user gets an empty [] back
  • allow test framework to retry more, e.g., container hasn't stopped/started yet

src/statd/python/yanger/yanger.py Outdated Show resolved Hide resolved
src/statd/python/yanger/yanger.py Outdated Show resolved Hide resolved
src/statd/python/yanger/yanger.py Outdated Show resolved Hide resolved
@troglobit troglobit marked this pull request as ready for review August 21, 2024 15:56
Copy link
Contributor

@wkz wkz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yanger is not my forte, but I tried to contribute some general feedback at least 😄

src/statd/python/yanger/yanger.py Outdated Show resolved Hide resolved
test/infamy/container.py Outdated Show resolved Hide resolved
test/infamy/container.py Outdated Show resolved Hide resolved
test/test.mk Show resolved Hide resolved
@mattiaswal mattiaswal self-requested a review August 22, 2024 07:29
Copy link
Contributor

@wkz wkz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The smaller diff to yanger allowed me see some other potential issues. Sorry I missed these initially!

src/statd/python/yanger/yanger.py Outdated Show resolved Hide resolved
src/statd/python/yanger/yanger.py Outdated Show resolved Hide resolved
src/statd/python/yanger/yanger.py Outdated Show resolved Hide resolved
src/statd/python/yanger/yanger.py Outdated Show resolved Hide resolved
test/env Outdated Show resolved Hide resolved
These changes are intended to make yanger a bit more robust.  Instead of
bailing out on command errors, we now return empty string or object back
to the callee in run_cmd() and run_json_cmd().

One example where yanger previously exited hard, and dragged with it the
entire session, is when inspecting container status or when attempting
to restart container instances which have yet to be created/stopped.

Fixes #558

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
When unit tests run in CI we may not have a syslog daemon available, in
such cases we set up a dummy handler.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
If a container has not yet stopped/started we may for proto RESTCONF get
"invalid URI" result back for some container actions.  With this change
we allow the action to be retried up to three times before passing on
the error.

In tests on Qemu (x86_64) this happens very rarely and need at most one
retry before succeeding.  Verified by iterating the same basic test over
night (9000+ iterations).

Fixes #558

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Fixes #568

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Copy link
Contributor

@wkz wkz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squeaky clean! ✨

@troglobit
Copy link
Contributor Author

Squeaky clean! ✨

Thank you! Your last suggestion btw, about allowing errors to bubble up, already uncovered a few missing test files and error handling in tests. Which @mattiaswal is working on right now 😎

@troglobit troglobit merged commit 1c2a491 into main Aug 22, 2024
4 checks passed
@troglobit troglobit deleted the test-reset-fail branch August 22, 2024 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: Unit tests fail almost every time due to race condition Factory reset may fail
3 participants