Mark KEP-2568 as implementable. #2654

vinayakankugoyal · 2021-04-28T22:04:28Z

No description provided.

k8s-ci-robot · 2021-04-28T22:04:36Z

Hi @vinayakankugoyal. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123

/cc @randomvariable @fabriziopandini
do we have more comments on this KEP before we mark it as "implementable"?

/cc @PushkarJ
any more comments from your as a SIG Security representative?

i will review this myself again, tomorrow.

k8s-ci-robot · 2021-04-28T22:09:47Z

@neolit123: GitHub didn't allow me to request PR reviews from the following users: pushkarj.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @randomvariable @fabriziopandini
do we have more comments on this KEP before we mark it as "implementable".

/cc @PushkarJ
any more comments from your as a SIG Security representative?

i will review this myself again, tomorrow.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2021-04-28T22:10:11Z

/ok-to-test

neolit123 · 2021-04-28T22:10:23Z

/milestone v1.22

neolit123 · 2021-04-28T22:30:09Z

looks like the PRR file is mandatory as part of CI now?

time="2021-04-28T22:13:28Z" level=info msg="PRR file: >?
/home/prow/go/src/github.com/kubernetes/enhancements/keps/prod-readiness/sig-cluster-lifecycle/2568.yaml"
--- FAIL: TestValidation (0.46s)
panic: assignment to entry in nil map [recovered]
	panic: assignment to entry in nil map

so this is something that has to be resolved with the PRR owners - e.g. whether we are going to skip it for some KEPS.

neolit123 · 2021-04-29T18:20:45Z

at this point github makes this difficult to review so i will just go trough the file and copy / comment.

Set the seccomp profile to runtime/default, this can be done by setting the seccompProfile field in the securityContext.

going back to this. do you see a risk in existing users having a breakage due to this change and depending on their seccomp settings?
cc @PushkarJ

As a security conscious user I would like to run the kubernetes control-plane as non-root.

i would extend the sentence with "to avoid vulnerabilities."

Create system users and let users override the defaults: Use adduser --system or equivalent to create UIDs in the SYS_UID_MIN - SYS_UID_MAX range and groupadd --system or equivalent to create GIDs in the SYS_GID_MIN - SYS_GID_MAX range. Additionally if users want to specify their own UIDs or GIDs we will support that through kubeadm patching.

given this is the best option that we picked (3), how are we going to handle running kubeadm init/join/upgrade in case the set of kubeadm-* users /groups already exists? i think the best option is to re-use them, but error if they are in unexpected range or have unexpected permissions?

with respect to cleanup, do we expect kubeadm reset to remove these users . groups. i'd say yes?
kubeadm reset tries to remove everything created by the kubeadm binary on the node.

cleanup on "reset" would work for "init/join" but "upgrade" does not have a command that can reverse its state, it's idempotent.
so if we upgrade a node from non-root to root but the process fails the user will be left with a number of users/groups on the node. subsequent upgrades must not fail because of that and just error if the existing names have unexpected IDs or permissions.

i think we might want to have a section in the doc that covers what do we do after a set of users / groups already exists on the node covering how we re-use them and clean them up. you could call this "Reusing user/groups and cleanup" under "Design details". but first you can tell me what do you think about this.

The author(s) believes that starting out with a safe default of option 3. and allowing the user to set the UID and GID through the kubeadm patching mechanism is more user-friendly for users who just wan't to quickly bootstrap and also users who care about which UIDs and GIDs that they want to run the control-plane as and also users . Further this feature will be opt-in and will be hidden behind a feature-gate.

i would add "hidden behind a feature-gate, until it graduates to GA".

it will be feature-gated only until GA. once the FG is dropped it becomes locked to true.
for beta it will be defaulted to true, at that point we might want to consider exposing the API for users to configure the UID/GID via the main kubeadm API. but it depends on the feedback too, we could leave it to the GA graduation or leave this to patches entirely.

Each of the components will run with a unique UID and GID. For each of the components we will create a unique user. For the shared files/resources we will create groups. The naming convention of these groups is tabulated below.

we should indicate that kubeadm would take exclusive ownership of these users/groups names and throw errors if it encounters existing names that are not in the expected ID range or about their permissions.

These tests will be added using the kinder tooling.

we normally add e2e tests during alpha, not sure i've mentioned that on the other PRs.
we can rephrase:
"These tests will be added using the kinder tooling during the Alpha stage."

e2e tests created and running periodically, but may not be completely green.

we can rephrase this to:
"e2e tests are running periodically and are green with respect to testing this functionality."

randomvariable · 2021-04-29T18:43:54Z

given this is the best option that we picked (3), how are we going to handle running kubeadm init/join/upgrade in case the set of kubeadm-* users /groups already exists? i think the best option is to re-use them, but error if they are in unexpected range or have unexpected permissions?

Yes, definitely we should reuse them.

with respect to cleanup, do we expect kubeadm reset to remove these users . groups. i'd say yes? kubeadm reset tries to remove everything created by the kubeadm binary on the node.

Agnostic on this one. I don't think it's a deal breaker if the users/groups hang around, which probably answers the upgrade question.

vinayakankugoyal · 2021-04-29T19:27:40Z

at this point github makes this difficult to review so i will just go trough the file and copy / comment.

Set the seccomp profile to runtime/default, this can be done by setting the seccompProfile field in the securityContext.

going back to this. do you see a risk in existing users having a breakage due to this change and depending on their seccomp settings?
cc @PushkarJ

No there isn't. The control-plane has had that setting in kube-up for a long time and we have not seen any issues. I'd be extremely surprised if the control-plane components started failing because of seccomProfile of runtime/default.

fabriziopandini · 2021-04-29T19:29:45Z

I'm supportive to get this implementable in 1.22 as soon as pending comments are addressed.
If this can help, I also don't think it is strictly required to support upgrading existing clusters to non root in the first iteration

neolit123 · 2021-04-29T19:41:54Z

with respect to cleanup, do we expect kubeadm reset to remove these users . groups. i'd say yes? kubeadm reset tries to remove everything created by the kubeadm binary on the node.

Agnostic on this one. I don't think it's a deal breaker if the users/groups hang around, which probably answers the upgrade question.

as pointed out, reset does clean all artifacts the kubeadm init/join left on the machine. so we might want to add that for consistency.

vinayakankugoyal · 2021-04-29T21:04:17Z

As a security conscious user I would like to run the kubernetes control-plane as non-root.

i would extend the sentence with "to avoid vulnerabilities."

Updated, thanks!

The author(s) believes that starting out with a safe default of option 3. and allowing the user to set the UID and GID through the kubeadm patching mechanism is more user-friendly for users who just wan't to quickly bootstrap and also users who care about which UIDs and GIDs that they want to run the control-plane as and also users . Further this feature will be opt-in and will be hidden behind a feature-gate.

i would add "hidden behind a feature-gate, until it graduates to GA".

it will be feature-gated only until GA. once the FG is dropped it becomes locked to true.
for beta it will be defaulted to true, at that point we might want to consider exposing the API for users to configure the UID/GID via the main kubeadm API. but it depends on the feedback too, we could leave it to the GA graduation or leave this to patches entirely.

Updated, thanks!

Each of the components will run with a unique UID and GID. For each of the components we will create a unique user. For the shared files/resources we will create groups. The naming convention of these groups is tabulated below.

we should indicate that kubeadm would take exclusive ownership of these users/groups names and throw errors if it encounters existing names that are not in the expected ID range or about their permissions.

Updated thanks!

These tests will be added using the kinder tooling.

we normally add e2e tests during alpha, not sure i've mentioned that on the other PRs.
we can rephrase:
"These tests will be added using the kinder tooling during the Alpha stage."

Updated, thanks!

e2e tests created and running periodically, but may not be completely green.

we can rephrase this to:
"e2e tests are running periodically and are green with respect to testing this functionality."

Updated, thanks!

vinayakankugoyal · 2021-04-29T21:21:00Z

Create system users and let users override the defaults: Use adduser --system or equivalent to create UIDs in the SYS_UID_MIN - SYS_UID_MAX range and groupadd --system or equivalent to create GIDs in the SYS_GID_MIN - SYS_GID_MAX range. Additionally if users want to specify their own UIDs or GIDs we will support that through kubeadm patching.

given this is the best option that we picked (3), how are we going to handle running kubeadm init/join/upgrade in case the set of kubeadm-* users /groups already exists? i think the best option is to re-use them, but error if they are in unexpected range or have unexpected permissions?

with respect to cleanup, do we expect kubeadm reset to remove these users . groups. i'd say yes?
kubeadm reset tries to remove everything created by the kubeadm binary on the node.

cleanup on "reset" would work for "init/join" but "upgrade" does not have a command that can reverse its state, it's idempotent.
so if we upgrade a node from non-root to root but the process fails the user will be left with a number of users/groups on the node. subsequent upgrades must not fail because of that and just error if the existing names have unexpected IDs or permissions.

i think we might want to have a section in the doc that covers what do we do after a set of users / groups already exists on the node covering how we re-use them and clean them up. you could call this "Reusing user/groups and cleanup" under "Design details". but first you can tell me what do you think about this.

I added 2 sections: one for reuse and one for cleanup. Let me know how those look. Thanks!

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

neolit123 · 2021-04-29T21:46:47Z

thank you for the updates. found one minor typo, the rest seems good to me.

once this gets approved and LGTM-ed, we will let it sit for few more days and apply lazy consensus on it - i.e. if nobody else has comments we will merge it, as is.

i'm proposing the lazy consensus to apply after EOD Tuesday next week.

/approve
/hold

k8s-ci-robot · 2021-04-29T21:47:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, vinayakankugoyal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-cluster-lifecycle/OWNERS~~ [neolit123]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vinayakankugoyal · 2021-04-29T21:54:40Z

i'm proposing the lazy consensus to apply after EOD Tuesday next week.

Yeah that seems reasonable to me. What time zone will we follow?

neolit123 · 2021-04-29T21:56:51Z

please squash the commits to 1; you can drop my authorship too.

Yeah that seems reasonable to me. What time zone will we follow?

we usually follow PST, as that is what k8s does.

i've notified the SIG mailing list if more people want to have a look:
https://groups.google.com/g/kubernetes-sig-cluster-lifecycle/c/4Ex7kPaL7tc

vinayakankugoyal · 2021-05-05T08:05:56Z

/hold cancel

vinayakankugoyal · 2021-05-05T08:07:10Z

We are now past the deadline. Anyone with the ability to LGTM please do so, so that we can merge. Thanks in advance!

vinayakankugoyal · 2021-05-05T22:57:21Z

@fabriziopandini - since you are the other approver of the KEP do you want to LGTM?

ehashman

/lgtm

fabriziopandini · 2021-05-10T12:18:08Z

Sorry, I'm late to this one...

vinayakankugoyal · 2021-05-10T15:51:25Z

Sorry, I'm late to this one...

No worries! Thank you for taking a look.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 28, 2021

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 28, 2021

k8s-ci-robot requested review from fabriziopandini and neolit123 April 28, 2021 22:04

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 28, 2021

neolit123 reviewed Apr 28, 2021

View reviewed changes

k8s-ci-robot requested a review from randomvariable April 28, 2021 22:09

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 28, 2021

k8s-ci-robot added this to the v1.22 milestone Apr 28, 2021

vinayakankugoyal force-pushed the implentable branch from 55b7904 to 315d1f3 Compare April 29, 2021 18:06

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 29, 2021

vinayakankugoyal force-pushed the implentable branch from 315d1f3 to 7f77385 Compare April 29, 2021 21:21

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 29, 2021

vinayakankugoyal force-pushed the implentable branch from 7f77385 to 5d80977 Compare April 29, 2021 21:32

neolit123 reviewed Apr 29, 2021

View reviewed changes

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md Outdated Show resolved Hide resolved

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 29, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2021

Mark KEP-2568 as implementable.

a4cf338

vinayakankugoyal force-pushed the implentable branch from 6029f9a to a4cf338 Compare April 29, 2021 22:19

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2021

ehashman reviewed May 6, 2021

View reviewed changes

k8s-ci-robot assigned ehashman May 6, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 6, 2021

k8s-ci-robot merged commit 812b69a into kubernetes:master May 6, 2021

vinayakankugoyal mentioned this pull request May 13, 2021

Run control-plane as non-root in kubeadm. #2568

Open

18 tasks

PushkarJ mentioned this pull request May 26, 2021

REQUEST: New membership for pushkarj kubernetes/org#2736

Closed

6 tasks

vinayakankugoyal mentioned this pull request Jun 3, 2021

REQUEST: New membership for vinayakankugoyal kubernetes/org#2761

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark KEP-2568 as implementable. #2654

Mark KEP-2568 as implementable. #2654

vinayakankugoyal commented Apr 28, 2021

k8s-ci-robot commented Apr 28, 2021

neolit123 left a comment •

edited

Loading

k8s-ci-robot commented Apr 28, 2021

neolit123 commented Apr 28, 2021

neolit123 commented Apr 28, 2021

neolit123 commented Apr 28, 2021

neolit123 commented Apr 29, 2021 •

edited

Loading

randomvariable commented Apr 29, 2021

vinayakankugoyal commented Apr 29, 2021 •

edited

Loading

fabriziopandini commented Apr 29, 2021 •

edited

Loading

neolit123 commented Apr 29, 2021

vinayakankugoyal commented Apr 29, 2021

vinayakankugoyal commented Apr 29, 2021

neolit123 commented Apr 29, 2021

k8s-ci-robot commented Apr 29, 2021

vinayakankugoyal commented Apr 29, 2021

neolit123 commented Apr 29, 2021

vinayakankugoyal commented May 5, 2021

vinayakankugoyal commented May 5, 2021

vinayakankugoyal commented May 5, 2021

ehashman left a comment

fabriziopandini commented May 10, 2021

vinayakankugoyal commented May 10, 2021

Mark KEP-2568 as implementable. #2654

Mark KEP-2568 as implementable. #2654

Conversation

vinayakankugoyal commented Apr 28, 2021

k8s-ci-robot commented Apr 28, 2021

neolit123 left a comment • edited Loading

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 28, 2021

neolit123 commented Apr 28, 2021

neolit123 commented Apr 28, 2021

neolit123 commented Apr 28, 2021

neolit123 commented Apr 29, 2021 • edited Loading

randomvariable commented Apr 29, 2021

vinayakankugoyal commented Apr 29, 2021 • edited Loading

fabriziopandini commented Apr 29, 2021 • edited Loading

neolit123 commented Apr 29, 2021

vinayakankugoyal commented Apr 29, 2021

vinayakankugoyal commented Apr 29, 2021

neolit123 commented Apr 29, 2021

k8s-ci-robot commented Apr 29, 2021

vinayakankugoyal commented Apr 29, 2021

neolit123 commented Apr 29, 2021

vinayakankugoyal commented May 5, 2021

vinayakankugoyal commented May 5, 2021

vinayakankugoyal commented May 5, 2021

ehashman left a comment

Choose a reason for hiding this comment

fabriziopandini commented May 10, 2021

vinayakankugoyal commented May 10, 2021

neolit123 left a comment •

edited

Loading

neolit123 commented Apr 29, 2021 •

edited

Loading

vinayakankugoyal commented Apr 29, 2021 •

edited

Loading

fabriziopandini commented Apr 29, 2021 •

edited

Loading