Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data/aws: Encrypt the AMI used by the bootstrap and master machines #1296

Merged
merged 1 commit into from
Feb 27, 2019

Conversation

wking
Copy link
Member

@wking wking commented Feb 22, 2019

This is a quick hack to get encrypted masters. Ideally we'd want to deregister these on bootstrap-teardown, but handling that nicely will be easier after some cleanups from #1148. As it stands, we'll need to deregister this as part of the general cluster teardown (hence the WIP). CC @ericavonb.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 22, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 22, 2019
@abhinavdahiya
Copy link
Contributor

/hold
we need to wait for 4.1 working branch.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 22, 2019
@ericavonb
Copy link

@abhinavdahiya why do we need to wait for a 4.1 branch? this is for 4.0 (exception approved)

@abhinavdahiya
Copy link
Contributor

We need to do this in Go so that we are using these AMIs for our machine objects.

@abhinavdahiya
Copy link
Contributor

@abhinavdahiya why do we need to wait for a 4.1 branch? this is for 4.0 (exception approved)

feel free to cancel the hold, I was not aware we were merging 4.0 exception approved feature.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2019
@wking
Copy link
Member Author

wking commented Feb 25, 2019

We need to do this in Go so that we are using these AMIs for our machine objects.

It's not clear to me how this is supposed to work with machine objects, which is why I don't think we need to do this in Go (yet). I expect we'll end up with a machine-API operator that can perform these copy-and-encrypt operations on its own, in which case the configuration would still use the source AMI (which is what we're currently putting there now) with an additional property to request "please use an encrypted copy of this AMI" (which doesn't exist yet). I don't think it's a problem to perform this copy-and-encrypt in Terraform while we work out how in-cluster encrypted AMIs are going to work. Thoughts?

@abhinavdahiya
Copy link
Contributor

We need to do this in Go so that we are using these AMIs for our machine objects.

It's not clear to me how this is supposed to work with machine objects, which is why I don't think we need to do this in Go (yet). I expect we'll end up with a machine-API operator that can perform these copy-and-encrypt operations on its own, in which case the configuration would still use the source AMI (which is what we're currently putting there now) with an additional property to request "please use an encrypted copy of this AMI" (which doesn't exist yet). I don't think it's a problem to perform this copy-and-encrypt in Terraform while we work out how in-cluster encrypted AMIs are going to work. Thoughts?

Why can't the installer create encrypted AMI and set them on machine objects?

@wking
Copy link
Member Author

wking commented Feb 25, 2019

Why can't the installer create encrypted AMI and set them on machine objects?

We can, but then we'd need to write metadata.json earlier so callers could clean up the resources (in this case, deregistering the copied AMI) if they later decided not to create the cluster or crashed for some other reason before creating the cluster. But maybe there's no way to maintain our current separation between creating all the manifests and creating external resources? If we decide that we can break that separation, it would mean we could transition from filters to zone IDs for the private hosted zone and similar as well, which would be something of a consolation prize :p.

@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 25, 2019
@wking wking changed the title WIP: data/aws: Encrypt the AMI used by the bootstrap and master machines data/aws: Encrypt the AMI used by the bootstrap and master machines Feb 25, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 25, 2019
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 25, 2019
@wking
Copy link
Member Author

wking commented Feb 25, 2019

I've pushed f08e266 with a rebase onto master and destroy cluster AMI deregistration, which is enough to remove the WIP. It would still be nice to deregister during bootstrap teardown, but I think we can circle back to that in follow-up work.

@abhinavdahiya
Copy link
Contributor

Can we update docs on AWS regarding the AMI creation and check for AMI limits per account.. ?

@abhinavdahiya
Copy link
Contributor

Also the openshift-dev account will need to prune these when terminating clusters after 48hrs.

@wking
Copy link
Member Author

wking commented Feb 25, 2019

Can we update docs on AWS regarding the AMI creation and check for AMI limits per account.. ?

I don't see a limit on the number of AMIs in an account (although maybe I'm just not looking in the right places?). There is "Destination Regions are limited to 50 concurrent AMI copies", although I'm not sure what that means. Is it a limit on concurrent copy operations or is it a cap on the number of AMIs which were originally created as copies of other AMIs? Is it per-account or across all accounts in a given region?

@abhinavdahiya
Copy link
Contributor

Also the openshift-dev account will need to prune these when terminating clusters after 48hrs.

Can we update docs on AWS regarding the AMI creation and check for AMI limits per account.. ?

I don't see a limit on the number of AMIs in an account (although maybe I'm just not looking in the right places?). There is "[Destination Regions are limited to 50 concurrent AMI copies]", although I'm not sure what that means. Is it a limit on concurrent copy operations or is it a cap on the number of AMIs which were originally created as copies of other AMIs? Is it per-account or across all accounts in a given region?

Can we update docs on AWS regarding the AMI creation

At least this.

@wking
Copy link
Member Author

wking commented Feb 25, 2019

Also the openshift-dev account will need to prune these when terminating clusters after 48hrs.

This would be addressed by DPP-1066, although I'm sure we can address this via more minor updates to their reaper in the meantime. I'll get something up there within 48 hours of this landing, although I don't expect this to be time-critical, since I can't find a clear AMI limit that leakers would exhaust.

@wking
Copy link
Member Author

wking commented Feb 26, 2019

Can we update docs on AWS regarding the AMI creation

At least this.

I don't see a good place for this. Maybe here? With an additional sentence like "This creates an encrypted AMI for the bootstrap and control-plane machines. The encrypted AMI is deregistered by destroy cluster." I can add that if you like, or I can add whatever wording you want to whatever doc location you want ;). But obviously create cluster creates resources which are removed by destroy cluster (and possibly also the bootstrap teardown), so personally I don't feel a need to call this out in our docs.

@wking
Copy link
Member Author

wking commented Feb 26, 2019

Yeah, add a small section that provides info...

Added with f08e266 -> 4fcef5e, although as I explain in the commit message, it's not clear to me whether users can configure the default EBS encryption key for the target region if they want to use a custom CMK instead of the Amazon-managed default. We may need to expose kms_key_id via the install config in follow-up work as we refine encryption support.

@abhinavdahiya
Copy link
Contributor

/hold cancel

/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 26, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

This is a quick hack to get encrypted masters.  Ideally we'd want to
deregister these on bootstrap-teardown, but handling that nicely will
be easier after some cleanups from [1].  As it stands, we'll
deregister this as part of the general cluster teardown.

Because we don't set kms_key_id [2] we get "the default AWS KMS Key"
(according to [2]).  AWS docs are not particularly clear about whether
users can configure the default key for their account/region to
override that default, although it is clear that it defaults to an
AWS-managed CMK [3] and that the alias for AMI encryption is
alias/aws/ebs [4].  If there is no way to override alias/aws/ebs,
we'll probably eventially need to expose kms_key_id to users.

[1]: openshift#1148
[2]: https://www.terraform.io/docs/providers/aws/r/ami_copy.html#kms_key_id
[3]: https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#aws-managed-cmk
[4]: https://aws.amazon.com/blogs/security/how-to-create-a-custom-ami-with-encrypted-amazon-ebs-snapshots-and-share-it-with-other-accounts-and-regions/
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2019
@wking
Copy link
Member Author

wking commented Feb 26, 2019

All green :)

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 99425cf into openshift:master Feb 27, 2019
@wking wking deleted the encrypted-master-amis branch February 27, 2019 03:58
wking added a commit to wking/openshift-installer that referenced this pull request Aug 5, 2019
Partially reverts 0c370dd (data/aws: Encrypt the AMI used by the
bootstrap and master machines, 2019-02-22, openshift#1296).  This isn't a clean
revert; for example, I left the ability to destroy images which are
tagged as owned by the cluster.  And we're still copy-and-encrypting
for the bootstrap machine until the AWS Terraform provider supports
requesting encrypted root volumes [1].  But with this commit, we no
longer use the copied AMI for control-plane nodes, which gets us one
step closer to having the cluster API provision them without installer
intervention.
wking added a commit to wking/openshift-installer that referenced this pull request Aug 5, 2019
Partially reverts 0c370dd (data/aws: Encrypt the AMI used by the
bootstrap and master machines, 2019-02-22, openshift#1296).  This isn't a clean
revert; for example, I left the ability to destroy images which are
tagged as owned by the cluster.  And we're still copy-and-encrypting
for the bootstrap machine until the AWS Terraform provider supports
requesting encrypted root volumes [1].  But with this commit, we no
longer use the copied AMI for control-plane nodes, which gets us one
step closer to having the cluster API provision them without installer
intervention.

[1]: hashicorp/terraform-provider-aws#8624
wking added a commit to wking/openshift-installer that referenced this pull request Aug 5, 2019
Partially reverts 0c370dd (data/aws: Encrypt the AMI used by the
bootstrap and master machines, 2019-02-22, openshift#1296).  This isn't a clean
revert; for example, I left the ability to destroy images which are
tagged as owned by the cluster.  And we're still copy-and-encrypting
for the bootstrap machine and control-plane machines until the AWS
Terraform provider supports requesting encrypted root volumes [1].
But with this commit, we're now documenting the encryption in a way
that covers both the previous AMI-based encryption used for
bootstrap/control-plane and the new root-volume-based encryption used
for the compute machines, because they come down to encrypted root
volumes regardless of their approach.

[1]: hashicorp/terraform-provider-aws#8624
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants