Dynamic No New Privileges (NNP) via bpf #38

brauner · 2025-07-15T08:19:30Z

Dynamic No New Privileges (NNP) via bpf

On newer systems the use of privilege escalating binaries (suid, sgid,
file capabilities) can be avoided. This model is illustrated in
systemd's run0 tool.

So it is possible to turn on PR_SET_NO_NEW_PRIVS (NNP) for systemd
itself and thus for every process on the system. However, that breaks
sandboxed workloads. Sandboxed workloads such as containers may run
a single process without a full-fledged daemon that could supervise
privileged operations. In such cases executing privilege escalating
binaries must be allowed.

Ideally sandboxes that require execution of privilege escalating
binaries must use a user namespace with a non-identity idmapping.

Instead of revamping the fairly inflexible NNP implementation, execution
of privilege escalating binaries should be supervised by a bpf LSM.

When a privilege escalating binary is executed in the initial user
namespace the bpf LSM program will cause the kernel to skip elevating
privileges and instead execute the binary with the caller's privileges.
This is equivalent to the NNP behavior.

If a privilege escalating binary is executed in a non-initial user
namespace the bpf LSM program will allow the kernel to escalate the
caller's privileges to a higher privilege level.

This will allow unprivileged containers to execute privilege escalating
binaries but completely isolate regular services from doing so.

This can of course be configurable on a per-service basis if needed.

This will require hooking up a new security hook into the kernel's exec
codepath.

Use-Case: Wean all of userspace off of privilege escalating
binaries.

Maybe @thejh has some thoughts here as well as @poettering and @DaanDeMeyer and @cyphar.

Signed-off-by: Christian Brauner <brauner@kernel.org>

thejh · 2025-07-15T13:03:17Z

I assume this BPF hook would not affect any task_no_new_privs() checks except for influencing the exec path? I guess you'd basically have to split existing task_no_new_privs() checks into "has NNP for purposes of limiting privilege gain on exec via setuid/setgid/fscaps/selinux transition/apparmor transition/smack transition" (which the BPF hook could force to "yes" but can't force to "no") and "has NNP for purposes of allowing process mutations that will persist across execve()" (which the BPF hook couldn't influence)?

In that case, I think adding such an API would be safe.

(FYI, from what I remember, Chrome currently wants to be able to either execute setuid binaries or create user namespaces for its sandbox; if both are blocked, I think it might not launch.)

Dynamic No New Privileges (NNP) via bpf

f134189

Signed-off-by: Christian Brauner <brauner@kernel.org>

brauner force-pushed the bpf branch from 71940b3 to f134189 Compare July 15, 2025 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamic No New Privileges (NNP) via bpf #38

Dynamic No New Privileges (NNP) via bpf #38

Uh oh!

brauner commented Jul 15, 2025 •

edited

Loading

Uh oh!

thejh commented Jul 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Dynamic No New Privileges (NNP) via bpf #38

Are you sure you want to change the base?

Dynamic No New Privileges (NNP) via bpf #38

Uh oh!

Conversation

brauner commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!