Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/start/asset: Add support for post-pod-manifests #13

Closed
wants to merge 2 commits into from

Conversation

wking
Copy link
Member

@wking wking commented Jan 29, 2019

So the installer can inject objects that require OpenShift custom resources (e.g. machine(set)s). We currently do this in a separate script that happens before bootstrap-complete, but it's a good fit for cluster-bootstrap, which already has object-injection logic for the pre-pod assets.

CC @sttts

So the installer can inject objects that require OpenShift custom
resources (e.g. machine(set)s).
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: sttts

If they are not already assigned, you can assign the PR to them by writing /assign @sttts in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 29, 2019
wking added a commit to wking/openshift-installer that referenced this pull request Jan 29, 2019
…own-event

In master there's a bit of a window during the bootstrap-teardown
dance:

1. cluster-bootstrap sees the requested pods.
2. cluster-bootstrap shuts itself down.
3. openshift.sh pushes the OpenShift-specific manifests.
4. report-progress.sh pushes bootstrap-complete.
5. The installer sees bootstrap-complete and removes the bootstrap
   resources, including the bootstrap load-balancer targets.
6. subsequent Kubernetes API traffic hits the production control
   plane.

That leaves a fairly large window from 3 through 5 where Kubernetes
API requests could be routed to the bootstrap machine and dropped
because it no longer has anything listening on 6443.

With this commit, we take advantage of
openshift/cluster-bootstrap@d07548e3 (Add --tear-down-event flag to
delay tear down, 2019-01-24, openshift/cluster-bootstrap#9) to drop
step 2 (waiting for an event we never send).  That leaves the
bootstrap control-plane running until we destroy that machine.  We
take advantage of openshift/cluster-bootstrap@180599bc
(pkg/start/asset: Add support for post-pod-manifests, 2019-01-29,
openshift/cluster-bootstrap#13) to replace our previous openshift.sh
(with a minor change to the manifest directory).  And we take
advantage of openshift/cluster-bootstrap@e5095848 (Create
bootstrap-success event before tear down, 2019-01-24,
openshift/cluster-bootstrap#9) to replace our previous
report-progress.sh (with a minor change to the event name).

Also set --strict, because we want to fail-fast for these resources.
The user is unlikely to scrape them out of the installer state and
push them by hand if we fail to push them from the bootstrap node.

With these changes, the new transition is:

1. cluster-bootstrap sees the requested pods.
2. cluster-bootstrap pushes the OpenShift-specific manifests.
3. cluster-bootstrap pushes bootstrap-success.
4. The installer sees bootstrap-success and removes the bootstrap
   resources, including the bootstrap load-balancer targets.
5. subsequent Kubernetes API traffic hits the production control
   plane.

There's still a small window for lost Kubernetes API traffic:

* The Terraform tear-down could remove the bootstrap machine before it
  removes the bootstrap load-balancer target, leaving the target
  pointing into empty space.
* Bootstrap teardown does not allow existing client connections to
  drain after removing the load balancer target before removing the
  bootstrap machine.

Both of these could be addressed by:

1. Remove the bootstrap load-balancer targets.
2. Wait for the 30 seconds (healthy_threshold * interval for our
   aws_lb_target_group [1]) for the load-balancer to notice the
   production control-plane targets are live.  This assumes the
   post-pod manifests are all pushed in zero seconds, so it's overly
   conservative, but waiting an extra 30 seconds isn't a large cost.
3. Remove the remaining bootstrap resources, including the bootstrap
   machine.

But even without that delay, this commit reduces the window compared
to what we have in master.  I'll land the delay in follow-up work.

[1]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html
@abhinavdahiya
Copy link
Contributor

There are no retries in createAssets.
The assets being pushed to cluster by https://github.com/openshift/installer/blob/v0.11.0/data/data/bootstrap/files/usr/local/bin/openshift.sh will keep failing until the api is made available by the operator.

createAssets is not a replacement for https://github.com/openshift/installer/blob/v0.11.0/data/data/bootstrap/files/usr/local/bin/openshift.sh

@wking
Copy link
Member Author

wking commented Jan 30, 2019

...will keep failing until the api is made available by the operator.

Ah, I'd expected these were CRDs which would be live before the pods we're waiting for come up. But if not, I'll add optional retry logic to createAssets.

@sttts
Copy link
Contributor

sttts commented Jan 30, 2019

/hold

until the discussion around openshift/installer#1147 (comment) is resolved.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 30, 2019
@sttts
Copy link
Contributor

sttts commented Jan 30, 2019

I am not sure we want to move cluster bootup orchestration (above the control plane itself) here. Machine objects need CRDs, don't they? Those CRDs are only available when the CVO has bootstrapped enough.

Instead of making cluster-bootstrap CVO logic aware (please, let's not do that, it's not its purpose, but CVO's), we should have a way to pass those manifests to CVO somehow.

@wking
Copy link
Member Author

wking commented Jan 30, 2019

I am not sure we want to move cluster bootup orchestration (above the control plane itself) here. Machine objects need CRDs, don't they?

That was @abhinavdahiya's point above. I still think that retries (in addition to our current pod waits) will allow us to handle this issue without needing to bake in a CVO-watcher.

Instead of making cluster-bootstrap CVO logic aware (please, let's not do that, it's not its purpose, but CVO's), we should have a way to pass those manifests to CVO somehow.

I don't think we want the CVO involved in manifests that aren't part of the update-payload.

@sttts
Copy link
Contributor

sttts commented Jan 31, 2019

That was @abhinavdahiya's point above. I still think that retries (in addition to our current pod waits) will allow us to handle this issue without needing to bake in a CVO-watcher.

To understand your point: you want that cluster-bootstrap keep trying to create those CustomResources until the corresponding CRD is installed by CVO and the create call will succeed? I think I could live with that.

@sttts
Copy link
Contributor

sttts commented Jan 31, 2019

If the installer waits for the bootstrap-success event, why can't it create the necessary manifests itself? I don't see a reason to mix non-control-plane bootstrapping logic in here.

For post-pod manifests that depend on CRDs pushed by operators, we may
get through waitUntilPodsRunning before the CRDs have been submitted
and registered.  This commit adds an optional retry delay, so we still
error immediately without retries for pre-pod manifests (this doesn't
seem very robust?) but the new post-pod manifests get the more relaxed
retry approach.

There's no cap on the number of retries.  For stuck clusters, the
installer will eventually give up on waiting and bail out, so we don't
need a timeout in cluster-bootstrap itself.
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 1, 2019
@openshift-ci-robot
Copy link

@wking: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/images 1af1035 link /test images

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking
Copy link
Member Author

wking commented Feb 1, 2019

I've pushed 180599b -> 1af1035 adding retries to the post-pod manifests. I'd personally prefer to have those retries for pre-pod manifests as well (there could be network hiccups, etc.), so let me know if you want me to drop the retryDelay plumbing and just hard-code the retry loop in (*creator).create.

If the installer waits for the bootstrap-success event, why can't it create the necessary manifests itself? I don't see a reason to mix non-control-plane bootstrapping logic in here.

It can. But this PR is an 11-line addition here to replace a 131-line removal in openshift/installer#1147. Collecting both pre- and post-pod manifest pushing here means we only need one pod-waiting implementation (vs. one in each repo), and only one list of pods to wait for, only one manifest-pushing implementation. This PR also allows those post-pod manifests to get pushed at the right time as a result of their placement in a Go function, vs. doing an event push/watch dance to facilitate a transition to a separate command living on the same bootstrap machine where cluster-bootstrap is running.

And the new functionality is opt-in. If callers don't want to use it, they just keep their stuff out of post-pod-manifests/.

@sttts
Copy link
Contributor

sttts commented Feb 4, 2019

But this PR is an 11-line addition here to replace a 131-line removal in openshift/installer#1147.

This is easy to solve by making manifest creation shared via library-go. I am not a fan of diluting clear responsibilities of components in order to save 100 lines of code.

What are we talking here about exactly? What are the resources that should be created between bootstrap-success and bootstrap-complete?

@wking
Copy link
Member Author

wking commented Feb 4, 2019

But this PR is an 11-line addition here to replace a 131-line removal in openshift/installer#1147.

This is easy to solve by making manifest creation shared via library-go. I am not a fan of diluting clear responsibilities of components in order to save 100 lines of code.

Putting manifest-pushing in library-go does not address triggering. We'd have to also add pod-waiting and copy our same list of pods over, add event listening and wait for bootstrap-success, or just repeatedly attempt the pushes the whole time cluster-bootstrap was running. None of those sound as simple/efficient as this PR.

What are we talking here about exactly? What are the resources that should be created between bootstrap-success and bootstrap-complete?

Currently master/worker Machine(Set)s, cloud secrets, the kubeadmin password, and whatever 99_binding-discovery.yaml is. Soon maybe some MachineConfigs too (openshift/installer#1150).

@sttts
Copy link
Contributor

sttts commented Feb 4, 2019

We'd have to also add pod-waiting

This concerns me most. Creating manifests is one thing (with some dumb logic). But waiting for pods sounds like non-trivial orchestration logic. None of "master/worker Machine(Set)s, cloud secrets, the kubeadmin password" is a pod.

@wking
Copy link
Member Author

wking commented Feb 4, 2019

We'd have to also add pod-waiting

This concerns me most. Creating manifests is one thing (with some dumb logic). But waiting for pods sounds like non-trivial orchestration logic. None of "master/worker Machine(Set)s, cloud secrets, the kubeadmin password" is a pod.

I think you mean "requires a pod", when these mostly require custom-resource registration (e.g. via CRDs). But those registrations are certainly correlated with pods, because the CVO is pushing out both CRDs and Deployments for the waited-on-pods. So while we still need retried pushes for these post-pod manifests (and as I pointed out here, I think we probably want retries for pre-pod manifests too), this PR's Go triggering is saving us from retries (or event waiting, or whatever) for the several minutes cluster-bootstrap is running before the waited pods come up.

@sttts
Copy link
Contributor

sttts commented Feb 4, 2019

So while we still need retried pushes for these post-pod manifests

This I get, especially for CRDs. Pods though seem to be irrelevant. They are a technical details of how the operators behind the CRDs are created. We shouldn't care in the post-pod (which should be better called "post-tear-down") phase.

@wking
Copy link
Member Author

wking commented Feb 4, 2019

We shouldn't care in the post-pod (which should be better called "post-tear-down") phase.

I don't care about retries in the post-pod phase. What I'm trying to avoid is polling retries for these day-2-ish manifests before the waited-on pods are up.

@sttts
Copy link
Contributor

sttts commented Feb 4, 2019

I don't care about retries in the post-pod phase. What I'm trying to avoid is polling retries for these day-2-ish manifests before the waited-on pods are up.

You have a bootstrap-success event ;-)

@wking
Copy link
Member Author

wking commented Feb 4, 2019

You have a bootstrap-success event ;-)

Right, but as I pointed out earlier, both the current cluster-bootstrap logic and this post-pod manifest pushing live on the bootstrap machine. Go function calls seem like a much simpler and more efficient way to serialize them than using a Kubernetes event.

@sttts
Copy link
Contributor

sttts commented Feb 4, 2019

Right, but as I pointed out earlier, both the current cluster-bootstrap logic and this post-pod manifest pushing live on the bootstrap machine.

What is the reason that this happens on the bootstrap machine at all and not via the installer directly?

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2019
@openshift-ci-robot
Copy link

@wking: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member Author

wking commented Mar 7, 2019

Obsoleted by #14, with which I can just dump everything into one bucket of manifests.

@wking wking closed this Mar 7, 2019
@wking wking deleted the post-pod-objects branch March 7, 2019 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants