Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet]: Unhealthy agent output badge is not removed on editing incorrect output when agent is not connected. #3334

Closed
amolnater-qasource opened this issue Dec 28, 2023 · 6 comments · Fixed by elastic/kibana#177685 or #3335
Assignees
Labels
bug Something isn't working impact:medium QA:Validated Validated by the QA Team Team:Fleet Label for the Fleet team

Comments

@amolnater-qasource
Copy link
Collaborator

Kibana Build details:

VERSION: 8.12.0 BC3
BUILD: 69985
COMMIT: 2a8afed8572a4c709aa1c64216748197eeb9b18f
Artifact Link: https://staging.elastic.co/8.12.0-61156bc6/summary-8.12.0.html

Host OS: All

Preconditions:

  1. 8.12.0 BC3 Kibana cloud environment should be available.
  2. Agent should be installed.

Steps to reproduce:

  1. Navigate to Fleet Settings.
  2. Create invalid Remote Elasticsearch output.
  3. Under Agent policy settings, select this invalid Remote Elasticsearch output.
  4. Navigate to Fleet settings and observe agent output is Unhealthy.
  5. Now remove the connected agent and update the agent output with correct configuration.
  6. Observe output status still remains unhealthy.

Screen Recording:

Agents.-.Fleet.-.Elastic.-.Google.Chrome.2023-12-27.19-07-53.mp4
Settings.-.Fleet.-.Elastic.-.Google.Chrome.2023-12-27.19-09-27.mp4

Expected Result:
Unhealthy agent output badge should be removed on editing incorrect output when agent is not connected and new status should be updated once agent gets connected.

Feature:
elastic/kibana#104986

@amolnater-qasource amolnater-qasource added bug Something isn't working impact:medium Team:Fleet Label for the Fleet team labels Dec 28, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@amolnater-qasource
Copy link
Collaborator Author

@manishgupta-qasource Please review.

@manishgupta-qasource
Copy link
Collaborator

Secondary review for this ticket is Done

@juliaElastic juliaElastic self-assigned this Feb 20, 2024
juliaElastic referenced this issue in elastic/kibana Feb 23, 2024
…pdated time (#177685)

## Summary

Closes https://github.com/elastic/kibana/issues/174008

Added a filter when querying remote ES output health status, to only
return results after the last update time of the output (`updated_at`
field of the SO).
This makes the health status reporting more accurate, so old statuses
are not staying on the UI, only latest status after the last update.
If the output query errors out or the `updated_at` field is not present,
the filter is omitted.


To verify:
- create a remote ES output (can be the same as the local ES), use it as
monitoring output of an agent policy
- enroll an agent to this agent policy
- update output to use an invalid host url
- wait until the remote ES output is showing up with error state on UI
- stop the Fleet-server
- update the remote ES output to use a correct host url
- wait until the remote ES output status is cleared on the UI
- start Fleet-server, wait until the agent checks in again (can be a few
minutes)
- verify that the remote ES output status shows up as healthy on the UI

Invalid url:
<img width="581" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/b8a98cb1-4a1b-4d74-b260-b95bf8eaac62">

Fleet-server stopped and updated to valid url:
<img width="1133" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/0e8a047f-48d8-4a3e-90e5-9a2ae1c2f874">

Fleet-server restarted:
<img width="1131" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/0cf642e5-b26f-41d7-ad45-acc2c6c6111f">


### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
kibanamachine referenced this issue in kibanamachine/kibana Feb 23, 2024
…pdated time (elastic#177685)

## Summary

Closes https://github.com/elastic/kibana/issues/174008

Added a filter when querying remote ES output health status, to only
return results after the last update time of the output (`updated_at`
field of the SO).
This makes the health status reporting more accurate, so old statuses
are not staying on the UI, only latest status after the last update.
If the output query errors out or the `updated_at` field is not present,
the filter is omitted.

To verify:
- create a remote ES output (can be the same as the local ES), use it as
monitoring output of an agent policy
- enroll an agent to this agent policy
- update output to use an invalid host url
- wait until the remote ES output is showing up with error state on UI
- stop the Fleet-server
- update the remote ES output to use a correct host url
- wait until the remote ES output status is cleared on the UI
- start Fleet-server, wait until the agent checks in again (can be a few
minutes)
- verify that the remote ES output status shows up as healthy on the UI

Invalid url:
<img width="581" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/b8a98cb1-4a1b-4d74-b260-b95bf8eaac62">

Fleet-server stopped and updated to valid url:
<img width="1133" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/0e8a047f-48d8-4a3e-90e5-9a2ae1c2f874">

Fleet-server restarted:
<img width="1131" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/0cf642e5-b26f-41d7-ad45-acc2c6c6111f">

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

(cherry picked from commit 2005cef)
@amolnater-qasource amolnater-qasource added the QA:Ready For Testing Code is merged and ready for QA to validate label Feb 23, 2024
kibanamachine referenced this issue in elastic/kibana Feb 23, 2024
… last updated time (#177685) (#177711)

# Backport

This will backport the following commits from `main` to `8.13`:
- [[Fleet] only show remote ES output health status if later than last
updated time (#177685)](#177685)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Julia
Bardi","email":"90178898+juliaElastic@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-02-23T13:18:06Z","message":"[Fleet]
only show remote ES output health status if later than last updated time
(#177685)\n\n## Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/174008\r\n\r\nAdded a filter
when querying remote ES output health status, to only\r\nreturn results
after the last update time of the output (`updated_at`\r\nfield of the
SO).\r\nThis makes the health status reporting more accurate, so old
statuses\r\nare not staying on the UI, only latest status after the last
update.\r\nIf the output query errors out or the `updated_at` field is
not present,\r\nthe filter is omitted.\r\n\r\n\r\nTo verify:\r\n- create
a remote ES output (can be the same as the local ES), use it
as\r\nmonitoring output of an agent policy\r\n- enroll an agent to this
agent policy\r\n- update output to use an invalid host url\r\n- wait
until the remote ES output is showing up with error state on UI\r\n-
stop the Fleet-server\r\n- update the remote ES output to use a correct
host url\r\n- wait until the remote ES output status is cleared on the
UI\r\n- start Fleet-server, wait until the agent checks in again (can be
a few\r\nminutes)\r\n- verify that the remote ES output status shows up
as healthy on the UI\r\n\r\nInvalid url:\r\n<img width=\"581\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/90178898/b8a98cb1-4a1b-4d74-b260-b95bf8eaac62\">\r\n\r\nFleet-server
stopped and updated to valid url:\r\n<img width=\"1133\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/90178898/0e8a047f-48d8-4a3e-90e5-9a2ae1c2f874\">\r\n\r\nFleet-server
restarted:\r\n<img width=\"1131\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/90178898/0cf642e5-b26f-41d7-ad45-acc2c6c6111f\">\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"2005cef574a083ceb15c568b6470a6c15d90ca0b","branchLabelMapping":{"^v8.14.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:Fleet","backport:prev-minor","v8.14.0"],"title":"[Fleet]
only show remote ES output health status if later than last updated
time","number":177685,"url":"https://github.com/elastic/kibana/pull/177685","mergeCommit":{"message":"[Fleet]
only show remote ES output health status if later than last updated time
(#177685)\n\n## Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/174008\r\n\r\nAdded a filter
when querying remote ES output health status, to only\r\nreturn results
after the last update time of the output (`updated_at`\r\nfield of the
SO).\r\nThis makes the health status reporting more accurate, so old
statuses\r\nare not staying on the UI, only latest status after the last
update.\r\nIf the output query errors out or the `updated_at` field is
not present,\r\nthe filter is omitted.\r\n\r\n\r\nTo verify:\r\n- create
a remote ES output (can be the same as the local ES), use it
as\r\nmonitoring output of an agent policy\r\n- enroll an agent to this
agent policy\r\n- update output to use an invalid host url\r\n- wait
until the remote ES output is showing up with error state on UI\r\n-
stop the Fleet-server\r\n- update the remote ES output to use a correct
host url\r\n- wait until the remote ES output status is cleared on the
UI\r\n- start Fleet-server, wait until the agent checks in again (can be
a few\r\nminutes)\r\n- verify that the remote ES output status shows up
as healthy on the UI\r\n\r\nInvalid url:\r\n<img width=\"581\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/90178898/b8a98cb1-4a1b-4d74-b260-b95bf8eaac62\">\r\n\r\nFleet-server
stopped and updated to valid url:\r\n<img width=\"1133\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/90178898/0e8a047f-48d8-4a3e-90e5-9a2ae1c2f874\">\r\n\r\nFleet-server
restarted:\r\n<img width=\"1131\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/90178898/0cf642e5-b26f-41d7-ad45-acc2c6c6111f\">\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"2005cef574a083ceb15c568b6470a6c15d90ca0b"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.14.0","branchLabelMappingKey":"^v8.14.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/177685","number":177685,"mergeCommit":{"message":"[Fleet]
only show remote ES output health status if later than last updated time
(#177685)\n\n## Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/174008\r\n\r\nAdded a filter
when querying remote ES output health status, to only\r\nreturn results
after the last update time of the output (`updated_at`\r\nfield of the
SO).\r\nThis makes the health status reporting more accurate, so old
statuses\r\nare not staying on the UI, only latest status after the last
update.\r\nIf the output query errors out or the `updated_at` field is
not present,\r\nthe filter is omitted.\r\n\r\n\r\nTo verify:\r\n- create
a remote ES output (can be the same as the local ES), use it
as\r\nmonitoring output of an agent policy\r\n- enroll an agent to this
agent policy\r\n- update output to use an invalid host url\r\n- wait
until the remote ES output is showing up with error state on UI\r\n-
stop the Fleet-server\r\n- update the remote ES output to use a correct
host url\r\n- wait until the remote ES output status is cleared on the
UI\r\n- start Fleet-server, wait until the agent checks in again (can be
a few\r\nminutes)\r\n- verify that the remote ES output status shows up
as healthy on the UI\r\n\r\nInvalid url:\r\n<img width=\"581\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/90178898/b8a98cb1-4a1b-4d74-b260-b95bf8eaac62\">\r\n\r\nFleet-server
stopped and updated to valid url:\r\n<img width=\"1133\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/90178898/0e8a047f-48d8-4a3e-90e5-9a2ae1c2f874\">\r\n\r\nFleet-server
restarted:\r\n<img width=\"1131\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/90178898/0cf642e5-b26f-41d7-ad45-acc2c6c6111f\">\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"2005cef574a083ceb15c568b6470a6c15d90ca0b"}}]}]
BACKPORT-->

Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
fkanout referenced this issue in fkanout/kibana Mar 4, 2024
…pdated time (elastic#177685)

## Summary

Closes https://github.com/elastic/kibana/issues/174008

Added a filter when querying remote ES output health status, to only
return results after the last update time of the output (`updated_at`
field of the SO).
This makes the health status reporting more accurate, so old statuses
are not staying on the UI, only latest status after the last update.
If the output query errors out or the `updated_at` field is not present,
the filter is omitted.


To verify:
- create a remote ES output (can be the same as the local ES), use it as
monitoring output of an agent policy
- enroll an agent to this agent policy
- update output to use an invalid host url
- wait until the remote ES output is showing up with error state on UI
- stop the Fleet-server
- update the remote ES output to use a correct host url
- wait until the remote ES output status is cleared on the UI
- start Fleet-server, wait until the agent checks in again (can be a few
minutes)
- verify that the remote ES output status shows up as healthy on the UI

Invalid url:
<img width="581" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/b8a98cb1-4a1b-4d74-b260-b95bf8eaac62">

Fleet-server stopped and updated to valid url:
<img width="1133" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/0e8a047f-48d8-4a3e-90e5-9a2ae1c2f874">

Fleet-server restarted:
<img width="1131" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/0cf642e5-b26f-41d7-ad45-acc2c6c6111f">


### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
@amolnater-qasource
Copy link
Collaborator Author

Hi @juliaElastic

We have revalidated this issue on latest 8.13.0 BC3 kibana cloud environment and found it still reproducible.

Observations:

  • Unhealthy agent output badge is removed once on editing incorrect output to correct when agent is not connected.
  • However, it again appears after few seconds.

Build details:
VERSION: 8.13.0 BC3
BUILD: 71857
COMMIT: 82f46148c91eec93ac7382147936028db2eb8883

Screen Recording:

Agents.-.Fleet.-.Elastic.-.Google.Chrome.2024-03-05.17-50-57.mp4

Hence we are reopening this issue.

Thanks!

@amolnater-qasource amolnater-qasource removed the QA:Ready For Testing Code is merged and ready for QA to validate label Mar 7, 2024
@juliaElastic
Copy link
Contributor

I had a look, and the issue is that the remote ES config is only updated on agent checkin/ack, so if the agent is stopped, the fleet-server monitor doesn't receive the new (correct) config, and incorrectly keeps doing the health check with the old (incorrect) config. I'll take a look how to fix this.

@juliaElastic juliaElastic transferred this issue from elastic/kibana Mar 11, 2024
@amolnater-qasource amolnater-qasource added the QA:Ready For Testing Code is merged and ready for QA to validate label Mar 12, 2024
@amolnater-qasource
Copy link
Collaborator Author

Hi Team,

We have revalidated this issue on latest 8.13.0 BC7 kibana cloud environment and found it fixed now.

Observations:

  • Unhealthy agent output badge is removed on editing incorrect output to correct when agent is not connected and doesn't appear again after sometime.

Screen Recording:

Agents.-.Fleet.-.Elastic.-.Google.Chrome.2024-03-26.12-32-13.mp4

image

Build details:
VERSION: 8.13.0 BC7
BUILD: 72069
COMMIT: 2e3a5cd43e835baa1d596b1aa54735992259ecb9

Hence, we are marking this issue as QA:Validated.
Thanks!!

@amolnater-qasource amolnater-qasource added QA:Validated Validated by the QA Team and removed QA:Ready For Testing Code is merged and ready for QA to validate labels Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working impact:medium QA:Validated Validated by the QA Team Team:Fleet Label for the Fleet team
Projects
None yet
4 participants