Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/destroy/bootstrap: Separate load-balancer target teardown #1148

Closed

Conversation

wking
Copy link
Member

@wking wking commented Jan 29, 2019

Builds on #1147; review that first.

Close a small window for lost Kubernetes API traffic:

  • The Terraform tear-down could remove the bootstrap machine before it removes the bootstrap load-balancer target, leaving the target pointing into empty space.
  • Bootstrap teardown does not allow existing client connections todrain after removing the load balancer target before removing the bootstrap machine.

With this commit, we:

  1. Wait 30 seconds for the production control plane to come up.
  2. Remove the bootstrap load-balancer targets.
  3. Wait 10 seconds for requests to the bootstrap machine to drain out.
  4. Remove the remaining bootstrap resources, including the bootstrap machine.

The 30-second calculation is provider specific. On AWS, it is 30-seconds for AWS to notice out production control-plane targets are live (healthy_threshold * interval for our aws_lb_target_group on AWS). This assumes the post-pod manifests are all pushed in zero seconds, so it's overly conservative, but waiting an extra 30 seconds isn't a large cost.

The 30-second delay also seems sufficient for libvirt's round-robin DNS, since clients should be able to fall-back to later IPs on their own. We may be able set shorter TTLs on libvirt DNS entries to firm that up, but clean transitions are less important for dev-only libvirt clusters anyway.

I'm not sure how OpenStack handles this teardown; naively grepping through data/data/openstack didn't turn up anything that looked much like a bootstrap load-balancer target resource.

CC @stts who pointed out this hole.

CC @flaper87, @tomassedovic, @hardys @russellb in case they have ideas on if/how this should be handled for OpenStack.

…own-event

In master there's a bit of a window during the bootstrap-teardown
dance:

1. cluster-bootstrap sees the requested pods.
2. cluster-bootstrap shuts itself down.
3. openshift.sh pushes the OpenShift-specific manifests.
4. report-progress.sh pushes bootstrap-complete.
5. The installer sees bootstrap-complete and removes the bootstrap
   resources, including the bootstrap load-balancer targets.
6. subsequent Kubernetes API traffic hits the production control
   plane.

That leaves a fairly large window from 3 through 5 where Kubernetes
API requests could be routed to the bootstrap machine and dropped
because it no longer has anything listening on 6443.

With this commit, we take advantage of
openshift/cluster-bootstrap@d07548e3 (Add --tear-down-event flag to
delay tear down, 2019-01-24, openshift/cluster-bootstrap#9) to drop
step 2 (waiting for an event we never send).  That leaves the
bootstrap control-plane running until we destroy that machine.  We
take advantage of openshift/cluster-bootstrap@180599bc
(pkg/start/asset: Add support for post-pod-manifests, 2019-01-29,
openshift/cluster-bootstrap#13) to replace our previous openshift.sh
(with a minor change to the manifest directory).  And we take
advantage of openshift/cluster-bootstrap@e5095848 (Create
bootstrap-success event before tear down, 2019-01-24,
openshift/cluster-bootstrap#9) to replace our previous
report-progress.sh (with a minor change to the event name).

Also set --strict, because we want to fail-fast for these resources.
The user is unlikely to scrape them out of the installer state and
push them by hand if we fail to push them from the bootstrap node.

With these changes, the new transition is:

1. cluster-bootstrap sees the requested pods.
2. cluster-bootstrap pushes the OpenShift-specific manifests.
3. cluster-bootstrap pushes bootstrap-success.
4. The installer sees bootstrap-success and removes the bootstrap
   resources, including the bootstrap load-balancer targets.
5. subsequent Kubernetes API traffic hits the production control
   plane.

There's still a small window for lost Kubernetes API traffic:

* The Terraform tear-down could remove the bootstrap machine before it
  removes the bootstrap load-balancer target, leaving the target
  pointing into empty space.
* Bootstrap teardown does not allow existing client connections to
  drain after removing the load balancer target before removing the
  bootstrap machine.

Both of these could be addressed by:

1. Remove the bootstrap load-balancer targets.
2. Wait for the 30 seconds (healthy_threshold * interval for our
   aws_lb_target_group [1]) for the load-balancer to notice the
   production control-plane targets are live.  This assumes the
   post-pod manifests are all pushed in zero seconds, so it's overly
   conservative, but waiting an extra 30 seconds isn't a large cost.
3. Remove the remaining bootstrap resources, including the bootstrap
   machine.

But even without that delay, this commit reduces the window compared
to what we have in master.  I'll land the delay in follow-up work.

[1]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 29, 2019
@wking wking added the kind/bug Categorizes issue or PR as related to a bug. label Jan 29, 2019
@wking wking changed the title pkg/destroy/bootstrap: Separate load-valance target teardown pkg/destroy/bootstrap: Separate load-balance target teardown Jan 29, 2019
@wking wking changed the title pkg/destroy/bootstrap: Separate load-balance target teardown pkg/destroy/bootstrap: Separate load-balancer target teardown Jan 29, 2019
}

time.Sleep(10 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

This is

  1. Wait 10 seconds for requests to the bootstrap machine to drain out.

from my topic post.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the AWS network load-balancer docs:

Connection draining ensures that in-flight requests complete before existing connections are closed. The initial state of a deregistering target is draining. By default, the state of a deregistering target changes to unused after 300 seconds. To change the amount of time that Elastic Load Balancing waits before changing the state to unused, update the deregistration delay value. We recommend that you specify a value of at least 120 seconds to ensure that requests are completed.

And from here:

Deregistering a target removes it from your target group, but does not affect the target otherwise. The load balancer stops routing requests to a target as soon as it is deregistered. The target enters the draining state until in-flight requests have completed.

The Terraform attachment-deletion logic is here, and while it fires a deregister request, it does not wait around for draining to complete. I don't see any issues in the provider repository about waiting for the unused state, but we could push something like that if we wanted more finesse here than a 10-second cross-platform sleep. I'm also fine just saying "we know who our consumers are at this point, and none of them will keep an open request going for more than 10 seconds", which is how I have it now. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed 5ee7650 -> a349ed1 working the above into the commit message.

@wking wking force-pushed the bootstrap-load-balancer-teardown branch from 5ee7650 to a349ed1 Compare January 30, 2019 05:13
Close a small window for lost Kubernetes API traffic:

* The Terraform tear-down could remove the bootstrap machine before it
  removes the bootstrap load-balancer target, leaving the target
  pointing into empty space.
* Bootstrap teardown does not allow existing client connections to
  drain after removing the load balancer target before removing the
  bootstrap machine.

With this commit, we:

1. Wait 30 seconds for the production control plane to come up.
2. Remove the bootstrap load-balancer targets.
3. Wait 10 seconds for requests to the bootstrap machine to drain out.
4. Remove the remaining bootstrap resources, including the bootstrap
   machine.

The 30-second calculation is provider specific. On AWS, it is
30-seconds for AWS to notice out production control-plane targets are
live (healthy_threshold * interval for our aws_lb_target_group on
AWS). This assumes the post-pod manifests are all pushed in zero
seconds, so it's overly conservative, but waiting an extra 30 seconds
isn't a large cost.

The 30-second delay doesn't really matter for libvirt, because clients
will have been banging away at the production control plane the whole
time, with those requests failing until the control plane came up to
listen.  But an extra 30 second delay is not a big deal either.

The 10-second delay for draining works around a Terraform plugin
limitation on AWS.  From the AWS network load-balancer docs [2]:

> Connection draining ensures that in-flight requests complete before
> existing connections are closed.  The initial state of a
> deregistering target is draining.  By default, the state of a
> deregistering target changes to unused after 300 seconds.  To change
> the amount of time that Elastic Load Balancing waits before changing
> the state to unused, update the deregistration delay value.  We
> recommend that you specify a value of at least 120 seconds to ensure
> that requests are completed.

And from [3]:

> Deregistering a target removes it from your target group, but does
> not affect the target otherwise.  The load balancer stops routing
> requests to a target as soon as it is deregistered.  The target
> enters the draining state until in-flight requests have completed.

The Terraform attachment-deletion logic is in [4], and while it fires
a deregister request, it does not wait around for draining to
complete.  I don't see any issues in the provider repository about
waiting for the unused state, but we could push something like that
[6] if we wanted more finesse here than a 10-second cross-platform
sleep.  For the moment, I'm just saying "we know who our consumers are
at this point, and none of them will keep an open request going for
more than 10 seconds".

The 10-second drain delay also seems sufficient for libvirt's
round-robin DNS, since clients should be able to fall-back to
alternative IPs on their own.  We may be able set shorter TTLs on
libvirt DNS entries to firm that up, but clean transitions are less
important for dev-only libvirt clusters anyway.  And, as for the
30-second delay for the production control plane to come up, clients
have been banging away on all of these IPs throughout the whole
bootstrap process.

I'm not sure how OpenStack handles this teardown; naively grepping
through data/data/openstack didn't turn up anything that looked much
like a bootstrap load-balancer target resource.

[1]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html
[2]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html
[3]: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#registered-targets
[4]: pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-aws/aws/resource_aws_lb_target_group_attachment.go#L80-L106
[5]: https://docs.aws.amazon.com/sdk-for-go/api/service/elbv2/#ELBV2.WaitUntilTargetDeregistered
@wking wking force-pushed the bootstrap-load-balancer-teardown branch from a349ed1 to 1c772ac Compare January 30, 2019 05:21
}

logrus.Info("Waiting 30 seconds for the production control-plane to enter the load balancer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you have to send tear down event. Otherwise no draining will happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you have to send tear down event. Otherwise no draining will happen.

This isn't for draining. This is waiting for the production control plane to enter the load balancer. The 10-second sleep in the destroy logic is for draining.

@sttts
Copy link
Contributor

sttts commented Jan 30, 2019

3. Wait 10 seconds for requests to the bootstrap machine to drain out.

How should this happen? The Kubernetes service endpoint is still in place. In-cluster clients talk to the bootstrap API server. I commented inline where the tear down event is to be sent. Then this strategy would work.

@wking
Copy link
Member Author

wking commented Jan 30, 2019

The Kubernetes service endpoint is still in place. In-cluster clients talk to the bootstrap API server.

This is what we're discussing here and later, right? If so, let's keep discussion there.

@openshift-ci-robot
Copy link
Contributor

@wking: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2019
wking added a commit to wking/openshift-installer that referenced this pull request Feb 22, 2019
This is a quick hack to get encrypted masters.  Ideally we'd want to
deregister these on bootstrap-teardown, but handling that nicely will
be easier after some cleanups from [1].  As it stands, we'll need to
deregister this as part of the general cluster teardown (hence the
WIP).

[1]: openshift#1148
wking added a commit to wking/openshift-installer that referenced this pull request Feb 25, 2019
This is a quick hack to get encrypted masters.  Ideally we'd want to
deregister these on bootstrap-teardown, but handling that nicely will
be easier after some cleanups from [1].  As it stands, we'll need to
deregister this as part of the general cluster teardown (hence the
WIP).

[1]: openshift#1148
wking added a commit to wking/openshift-installer that referenced this pull request Feb 25, 2019
This is a quick hack to get encrypted masters.  Ideally we'd want to
deregister these on bootstrap-teardown, but handling that nicely will
be easier after some cleanups from [1].  As it stands, we'll
deregister this as part of the general cluster teardown.

[1]: openshift#1148
wking added a commit to wking/openshift-installer that referenced this pull request Feb 26, 2019
This is a quick hack to get encrypted masters.  Ideally we'd want to
deregister these on bootstrap-teardown, but handling that nicely will
be easier after some cleanups from [1].  As it stands, we'll
deregister this as part of the general cluster teardown.

Because we don't set kms_key_id [2] we get "the default AWS KMS Key"
(according to [2]).  AWS docs are not particularly clear about whether
users can configure the default key for their account/region to
override that default, although it is clear that it defaults to an
AWS-managed CMK [3] and that the alias for AMI encryption is
alias/aws/ebs [4].  If there is no way to override alias/aws/ebs,
we'll probably eventially need to expose kms_key_id to users.

[1]: openshift#1148
[2]: https://www.terraform.io/docs/providers/aws/r/ami_copy.html#kms_key_id
[3]: https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#aws-managed-cmk
[4]: https://aws.amazon.com/blogs/security/how-to-create-a-custom-ami-with-encrypted-amazon-ebs-snapshots-and-share-it-with-other-accounts-and-regions/
wking added a commit to wking/openshift-installer that referenced this pull request Feb 26, 2019
This is a quick hack to get encrypted masters.  Ideally we'd want to
deregister these on bootstrap-teardown, but handling that nicely will
be easier after some cleanups from [1].  As it stands, we'll
deregister this as part of the general cluster teardown.

Because we don't set kms_key_id [2] we get "the default AWS KMS Key"
(according to [2]).  AWS docs are not particularly clear about whether
users can configure the default key for their account/region to
override that default, although it is clear that it defaults to an
AWS-managed CMK [3] and that the alias for AMI encryption is
alias/aws/ebs [4].  If there is no way to override alias/aws/ebs,
we'll probably eventially need to expose kms_key_id to users.

[1]: openshift#1148
[2]: https://www.terraform.io/docs/providers/aws/r/ami_copy.html#kms_key_id
[3]: https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#aws-managed-cmk
[4]: https://aws.amazon.com/blogs/security/how-to-create-a-custom-ami-with-encrypted-amazon-ebs-snapshots-and-share-it-with-other-accounts-and-regions/
wking added a commit to wking/openshift-installer that referenced this pull request Feb 26, 2019
This is a quick hack to get encrypted masters.  Ideally we'd want to
deregister these on bootstrap-teardown, but handling that nicely will
be easier after some cleanups from [1].  As it stands, we'll
deregister this as part of the general cluster teardown.

Because we don't set kms_key_id [2] we get "the default AWS KMS Key"
(according to [2]).  AWS docs are not particularly clear about whether
users can configure the default key for their account/region to
override that default, although it is clear that it defaults to an
AWS-managed CMK [3] and that the alias for AMI encryption is
alias/aws/ebs [4].  If there is no way to override alias/aws/ebs,
we'll probably eventially need to expose kms_key_id to users.

[1]: openshift#1148
[2]: https://www.terraform.io/docs/providers/aws/r/ami_copy.html#kms_key_id
[3]: https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#aws-managed-cmk
[4]: https://aws.amazon.com/blogs/security/how-to-create-a-custom-ami-with-encrypted-amazon-ebs-snapshots-and-share-it-with-other-accounts-and-regions/
@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/gofmt 1c772ac link /test gofmt
ci/prow/e2e-aws-upgrade 1c772ac link /test e2e-aws-upgrade
ci/prow/verify-vendor 1c772ac link /test verify-vendor
ci/prow/e2e-aws 1c772ac link /test e2e-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@abhinavdahiya
Copy link
Contributor

I think we can live without separating destroy bootstrap in 2 phases. LB health checks should make this not required.

/close

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Closed this PR.

In response to this:

I think we can live without separating destroy bootstrap in 2 phases. LB health checks should make this not required.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants