-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When Workers are Enabled Webserver Health Check Takes a Long Time to Respond #2517
Comments
@RobertKeyser I'm asking this to explicitly confirm, this is only a problem when workers/celery is enabled? |
That's when I started noticing it, but I can't confirm with any certainty that it's related. I'm still seeing slowness, though. |
I deployed an instance of Fides with 0 workers and it's blazing fast. The instances of Fides with workers are still sluggish, showing ~4 second times. |
@RobertKeyser is it only the health endpoint or other endpoints as well? Sounds like it's only the health check, which I think makes this lower priority |
Re-opening this issue as unfortunately I can still reproduce the issue on builds that include #3884. Here's an example of the latency experienced on a staging environment where the fides webserver is deployed with a worker. ~ ❯ for i in {1..10}; do curl -o /dev/null -s -w 'Total: %{time_total}s\n' https://fides-nightly.redacted.example.com/health; done
Total: 1.619734s
Total: 7.887261s
Total: 3.163243s
Total: 1.537107s
Total: 1.567872s
Total: 9.011004s
Total: 1.655995s
Total: 1.616508s
Total: 2.597901s
Total: 8.119707s When workers are eliminated, the issue disappears. The API response time drops to about 500ms on average. |
I find the inconsistency here really interesting, it makes it less obvious what the issue might be |
I've tested an alpha image based on #3898 and can confirm that the latency issue is gone. ~ ❯ for i in {1..10}; do curl -o /dev/null -s -w 'Total: %{time_total}s\n' https://fides-sandbox.redacted.example.com/health; done
Total: 0.589513s
Total: 0.481228s
Total: 0.469443s
Total: 0.599608s
Total: 0.485188s
Total: 0.483681s
Total: 0.496315s
Total: 0.484560s
Total: 0.497816s
Total: 0.510221s I will close this issue and get the above PR merged to main, but only after creating a follow-on issue to investigate the real root cause. |
@daveqnet Is it ok close this out? |
@Roger-Ethyca, yes indeed, as long as your tests are passing now that #3898 is merged to main? |
moving to done |
Bug Description
When workers are enabled, the
/health
route on the webserver takes > 1 second to respond. I tried with 1, 5, and 10 workers and didn't notice any significant difference in the time it took.Results of 10 trials each:
Steps to Reproduce
Expected behavior
< 1 second http response time
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
Additional context
Kubernetes default timeout for probes is 1 second, meaning with these times, Kubernetes will kill the pods if a liveness probe does not have a
timeoutSeconds
>= ~5 secondsThe text was updated successfully, but these errors were encountered: