Dynamic No New Privileges (NNP) via bpf #38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Dynamic No New Privileges (NNP) via bpf
On newer systems the use of privilege escalating binaries (suid, sgid,
file capabilities) can be avoided. This model is illustrated in
systemd's
run0
tool.So it is possible to turn on
PR_SET_NO_NEW_PRIVS
(NNP) for systemditself and thus for every process on the system. However, that breaks
sandboxed workloads. Sandboxed workloads such as containers may run
a single process without a full-fledged daemon that could supervise
privileged operations. In such cases executing privilege escalating
binaries must be allowed.
Ideally sandboxes that require execution of privilege escalating
binaries must use a user namespace with a non-identity idmapping.
Instead of revamping the fairly inflexible NNP implementation, execution
of privilege escalating binaries should be supervised by a bpf LSM.
When a privilege escalating binary is executed in the initial user
namespace the bpf LSM program will cause the kernel to skip elevating
privileges and instead execute the binary with the caller's privileges.
This is equivalent to the NNP behavior.
If a privilege escalating binary is executed in a non-initial user
namespace the bpf LSM program will allow the kernel to escalate the
caller's privileges to a higher privilege level.
This will allow unprivileged containers to execute privilege escalating
binaries but completely isolate regular services from doing so.
This can of course be configurable on a per-service basis if needed.
This will require hooking up a new security hook into the kernel's exec
codepath.
Use-Case: Wean all of userspace off of privilege escalating
binaries.
Maybe @thejh has some thoughts here as well as @poettering and @DaanDeMeyer and @cyphar.