Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop self monitor output health reporting if output config is not ack… #3335

Merged
merged 3 commits into from
Mar 11, 2024

Conversation

juliaElastic
Copy link
Contributor

@juliaElastic juliaElastic commented Mar 11, 2024

…ed by agents

What is the problem this PR solves?

Output health kept reporting incorrect state after output config was updated, but the new config was not yet acked by any agents.

How does this PR solve the problem?

Self monitor compares the current bulker output config with the latest output config from .fleet-policies. If the bulker doesn't use the latest config, the self monitor will stop reporting output health until an agent acks it.

How to test this PR locally

  • Create a remote es output that points to a second ES
  • Enroll an agent that uses the remote es
  • Update the remote es config to point to an invalid host
  • The output health UI should show the connection error
  • Stop the agent
  • Update the remote es config to point to a valid host again
  • The output health UI should clear
  • Re-start the agent
  • The output health UI should show healthy status
output_health.mov

Design Checklist

  • I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
  • I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
  • I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool

Related issues

Closes #3334

@juliaElastic juliaElastic added the bug Something isn't working label Mar 11, 2024
@juliaElastic juliaElastic self-assigned this Mar 11, 2024
@juliaElastic juliaElastic requested a review from a team as a code owner March 11, 2024 11:56
Copy link

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM 🚀

@juliaElastic juliaElastic merged commit 59facd6 into elastic:main Mar 11, 2024
8 checks passed
@juliaElastic juliaElastic added the backport-v8.13.0 Automated backport with mergify label Mar 11, 2024
mergify bot pushed a commit that referenced this pull request Mar 11, 2024
#3335)

* Stop self monitor output health reporting if output config is not acked by agents

* updated changelog

* added test to error scenario

(cherry picked from commit 59facd6)
juliaElastic added a commit that referenced this pull request Mar 11, 2024
#3335) (#3336)

* Stop self monitor output health reporting if output config is not acked by agents

* updated changelog

* added test to error scenario

(cherry picked from commit 59facd6)

Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.13.0 Automated backport with mergify bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet]: Unhealthy agent output badge is not removed on editing incorrect output when agent is not connected.
2 participants