Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Opening an alerting flyout is very slow #77472

Closed
sorenlouv opened this issue Sep 15, 2020 · 7 comments · Fixed by #80996
Closed

[Alerting] Opening an alerting flyout is very slow #77472

sorenlouv opened this issue Sep 15, 2020 · 7 comments · Fixed by #80996
Assignees
Labels
apm:alerting Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@sorenlouv
Copy link
Member

sorenlouv commented Sep 15, 2020

Background
When opening the alerting flyout there is a significant delay before the user can interact with the ui. This is quite painful if a user wants to create multiple alerts.

It turns out that the request to api/alerts/_health is blocking and thus causes the flyout component to hang until it has been loaded. Other requests do not load in parallel and the user has to wait additional time for those to load (eg. actions) afterwards.
This means that in my example below it takes more than 4 seconds (2.33s + 1.40s) from the user opens the flyout until they can interact with the actions.

image

Question
It would be very helpful if the health check could become non-blocking, and thus allow the alerting flyout to render while it is loading.
I understand that this will cause some flickering for those users that do not have TLS enabled etc but this will be a benefit for the large majority of users who has alerting setup correctly and just want to interact with the alerting component.

@sorenlouv sorenlouv added Team:APM All issues that need APM UI Team support Feature:Alerting :Alerting apm:alerting labels Sep 15, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@sorenlouv
Copy link
Member Author

I switched to a different cluster that turns out to be even slower than the previous one:

image

It takes almost 20 seconds from the user clicks a button until the alerting flyout is ready.
Again, the health endpoint is particularly slow. While there might be room for optimizing this I still thing the most important is to make it non-blocking.

@sorenlouv sorenlouv changed the title [Alerting] Opening the alerting flyout is very slow [Alerting] Opening an alerting flyout is very slow Sep 15, 2020
@ogupte ogupte self-assigned this Sep 15, 2020
@sorenlouv sorenlouv added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label Sep 15, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@ogupte ogupte removed their assignment Sep 15, 2020
@sorenlouv sorenlouv removed Team:APM All issues that need APM UI Team support [zube]: Inbox labels Sep 18, 2020
@mikecote mikecote mentioned this issue Oct 13, 2020
36 tasks
@ymao1
Copy link
Contributor

ymao1 commented Oct 15, 2020

@sqren @elastic/kibana-alerting-services

Do we know why the health check is returning so slowly? It seems like it might be better to solve that underlying issue rather than switching to a non-blocking health endpoint? I wasn't able to reproduce the slowness locally so I introduced a 5-second delay and switched to a non-blocking health check. Even showing the create alert form for 5 seconds and then showing the TLS message seems like a distracting user experience to me:

Oct-15-2020 15-47-41

I imagine if the health check is taking 20 seconds, a user would have enough time to fill out a couple of fields and maybe even try to save before being told that they need an encryption key. Or, as you mentioned, this would only happen to a small minority of users? What do you think?

@sorenlouv
Copy link
Member Author

sorenlouv commented Oct 15, 2020

Even showing the create alert form for 5 seconds and then showing the TLS message seems like a distracting user experience to me:

I think this is the wrong scenario to optimise for. Users who don't have TLS enabled won't be able to create alerts. Why would they keep coming back to the alerting flyout day in, day out, multiple times a day? They'll open the flyout once, figure out they need to enable TLS and either enable it or decide against it and never come back.
The decision to wait for the TLS check is hurting the 99% of use cases who have already setup TLS and just want to create an alert.

@sorenlouv
Copy link
Member Author

I imagine if the health check is taking 20 seconds, a user would have enough time to fill out a couple of fields

Yes, this is my point. Instead of blocking the ui we can let the user start filling out the alert while we perform the check in the background.

and maybe even try to save before being told that they need an encryption key.

The save button could be disabled with a text on hover letting the user know that we are performing a background check.

@ymao1
Copy link
Contributor

ymao1 commented Oct 15, 2020

Gotcha. Sounds like very few users would encounter the weird UX so I'm fine with that.

@ymao1 ymao1 self-assigned this Oct 16, 2020
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:alerting Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants