-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Readiness and Liveness probes re-added #1361
[MRG] Readiness and Liveness probes re-added #1361
Conversation
The hub pod may lack time to startup properly. This mainly because the hub pod will inspect its previously known pods and ensure they exist before starting to listen to endpoints like /hub/health. This inspection can take up to 30 seconds currently assuming the default value of c.KubeSpawner.http_timeout. The liveness probes could without this configuration conclude that the hub pod needs restarting after 3 failure attempts made with 10 second intervals, so perhaps after 20 seconds. This commit ensures we at least survive for the 30 initial seconds.
Maybe something for a future PR: The CI test script has several workarounds to ensure the hub is ready, and to work around race conditions:
Presumably all these sections can be removed and replaced with just one loop that checks the deployment status is ready? |
I left one big comment on the liveness probe for the hub. The other probes look good to me. |
@manics ah yes nice we could await for the hub and proxy to be ready according to k8s readinessProbes. Like this!
|
The only downside of this change, as far as I know, is that we may delay successful responses by max 10 seconds to the hub, as the hub won't get traffic until it is considered ready, and that is polled every 10 seconds. The key benefit i see with this PR is that it prepares us for higher availability matters, such as not having a service disruptions of the hub pod during helm chart upgrades because a new one could startup alongside and only received traffic when it was ready and not before it isn't for example. Speaking of which... My hub pod shuts down, and then starts up. I wonder if I could instead make it a rolling update since I have an external DB. |
I'll merge this now.
I am not sure. Reading all your other comments/issues and your quest for HA hubs: I think that jupyterhub (the program) assumes it is the only one reading and writing to the database. Working on that assumption is (I think) what needs to happen before we can change the upgrade strategy and other HA related things. If the upgrade strategy wasn't |
Just had another thought, the probes are applied to the container so could you move the database upgrade to an initcontainer? An added benefit, especially if full HA support is added, is it'll give more control over the upgrade process. |
@manics ooooh yes an excellent idea! Could you repost that idea as a new issue? |
Manages parts of the issue #1357 and implements the PR #1004 finally that I merged and then reverted in #1356 because it was a bit problematic to have a livenessProbe if we only accepted waiting 30 seconds.
Having a livenessProbe is certainly not out of the question, but perhaps we need it to have a much longer startup time. The key point of it would be to recover from a memory leak or similar, and those would occur far later in time anyhow.