-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/start/asset: Add support for post-pod-manifests #13
Conversation
So the installer can inject objects that require OpenShift custom resources (e.g. machine(set)s).
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: wking If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…own-event In master there's a bit of a window during the bootstrap-teardown dance: 1. cluster-bootstrap sees the requested pods. 2. cluster-bootstrap shuts itself down. 3. openshift.sh pushes the OpenShift-specific manifests. 4. report-progress.sh pushes bootstrap-complete. 5. The installer sees bootstrap-complete and removes the bootstrap resources, including the bootstrap load-balancer targets. 6. subsequent Kubernetes API traffic hits the production control plane. That leaves a fairly large window from 3 through 5 where Kubernetes API requests could be routed to the bootstrap machine and dropped because it no longer has anything listening on 6443. With this commit, we take advantage of openshift/cluster-bootstrap@d07548e3 (Add --tear-down-event flag to delay tear down, 2019-01-24, openshift/cluster-bootstrap#9) to drop step 2 (waiting for an event we never send). That leaves the bootstrap control-plane running until we destroy that machine. We take advantage of openshift/cluster-bootstrap@180599bc (pkg/start/asset: Add support for post-pod-manifests, 2019-01-29, openshift/cluster-bootstrap#13) to replace our previous openshift.sh (with a minor change to the manifest directory). And we take advantage of openshift/cluster-bootstrap@e5095848 (Create bootstrap-success event before tear down, 2019-01-24, openshift/cluster-bootstrap#9) to replace our previous report-progress.sh (with a minor change to the event name). Also set --strict, because we want to fail-fast for these resources. The user is unlikely to scrape them out of the installer state and push them by hand if we fail to push them from the bootstrap node. With these changes, the new transition is: 1. cluster-bootstrap sees the requested pods. 2. cluster-bootstrap pushes the OpenShift-specific manifests. 3. cluster-bootstrap pushes bootstrap-success. 4. The installer sees bootstrap-success and removes the bootstrap resources, including the bootstrap load-balancer targets. 5. subsequent Kubernetes API traffic hits the production control plane. There's still a small window for lost Kubernetes API traffic: * The Terraform tear-down could remove the bootstrap machine before it removes the bootstrap load-balancer target, leaving the target pointing into empty space. * Bootstrap teardown does not allow existing client connections to drain after removing the load balancer target before removing the bootstrap machine. Both of these could be addressed by: 1. Remove the bootstrap load-balancer targets. 2. Wait for the 30 seconds (healthy_threshold * interval for our aws_lb_target_group [1]) for the load-balancer to notice the production control-plane targets are live. This assumes the post-pod manifests are all pushed in zero seconds, so it's overly conservative, but waiting an extra 30 seconds isn't a large cost. 3. Remove the remaining bootstrap resources, including the bootstrap machine. But even without that delay, this commit reduces the window compared to what we have in master. I'll land the delay in follow-up work. [1]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html
There are no retries in
|
Ah, I'd expected these were CRDs which would be live before the pods we're waiting for come up. But if not, I'll add optional retry logic to |
/hold until the discussion around openshift/installer#1147 (comment) is resolved. |
I am not sure we want to move cluster bootup orchestration (above the control plane itself) here. Machine objects need CRDs, don't they? Those CRDs are only available when the CVO has bootstrapped enough. Instead of making cluster-bootstrap CVO logic aware (please, let's not do that, it's not its purpose, but CVO's), we should have a way to pass those manifests to CVO somehow. |
That was @abhinavdahiya's point above. I still think that retries (in addition to our current pod waits) will allow us to handle this issue without needing to bake in a CVO-watcher.
I don't think we want the CVO involved in manifests that aren't part of the update-payload. |
To understand your point: you want that cluster-bootstrap keep trying to create those CustomResources until the corresponding CRD is installed by CVO and the create call will succeed? I think I could live with that. |
If the installer waits for the bootstrap-success event, why can't it create the necessary manifests itself? I don't see a reason to mix non-control-plane bootstrapping logic in here. |
For post-pod manifests that depend on CRDs pushed by operators, we may get through waitUntilPodsRunning before the CRDs have been submitted and registered. This commit adds an optional retry delay, so we still error immediately without retries for pre-pod manifests (this doesn't seem very robust?) but the new post-pod manifests get the more relaxed retry approach. There's no cap on the number of retries. For stuck clusters, the installer will eventually give up on waiting and bail out, so we don't need a timeout in cluster-bootstrap itself.
@wking: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
I've pushed 180599b -> 1af1035 adding retries to the post-pod manifests. I'd personally prefer to have those retries for pre-pod manifests as well (there could be network hiccups, etc.), so let me know if you want me to drop the
It can. But this PR is an 11-line addition here to replace a 131-line removal in openshift/installer#1147. Collecting both pre- and post-pod manifest pushing here means we only need one pod-waiting implementation (vs. one in each repo), and only one list of pods to wait for, only one manifest-pushing implementation. This PR also allows those post-pod manifests to get pushed at the right time as a result of their placement in a Go function, vs. doing an event push/watch dance to facilitate a transition to a separate command living on the same bootstrap machine where And the new functionality is opt-in. If callers don't want to use it, they just keep their stuff out of |
This is easy to solve by making manifest creation shared via library-go. I am not a fan of diluting clear responsibilities of components in order to save 100 lines of code. What are we talking here about exactly? What are the resources that should be created between bootstrap-success and bootstrap-complete? |
Putting manifest-pushing in library-go does not address triggering. We'd have to also add pod-waiting and copy our same list of pods over, add event listening and wait for
Currently master/worker Machine(Set)s, cloud secrets, the kubeadmin password, and whatever |
This concerns me most. Creating manifests is one thing (with some dumb logic). But waiting for pods sounds like non-trivial orchestration logic. None of "master/worker Machine(Set)s, cloud secrets, the kubeadmin password" is a pod. |
I think you mean "requires a pod", when these mostly require custom-resource registration (e.g. via CRDs). But those registrations are certainly correlated with pods, because the CVO is pushing out both CRDs and Deployments for the waited-on-pods. So while we still need retried pushes for these post-pod manifests (and as I pointed out here, I think we probably want retries for pre-pod manifests too), this PR's Go triggering is saving us from retries (or event waiting, or whatever) for the several minutes |
This I get, especially for CRDs. Pods though seem to be irrelevant. They are a technical details of how the operators behind the CRDs are created. We shouldn't care in the post-pod (which should be better called "post-tear-down") phase. |
I don't care about retries in the post-pod phase. What I'm trying to avoid is polling retries for these day-2-ish manifests before the waited-on pods are up. |
You have a |
Right, but as I pointed out earlier, both the current |
What is the reason that this happens on the bootstrap machine at all and not via the installer directly? |
@wking: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Obsoleted by #14, with which I can just dump everything into one bucket of manifests. |
So the installer can inject objects that require OpenShift custom resources (e.g. machine(set)s). We currently do this in a separate script that happens before
bootstrap-complete
, but it's a good fit for cluster-bootstrap, which already has object-injection logic for the pre-pod assets.CC @sttts