Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uname parsing fallback path to KernelVersion() #823

Closed
wants to merge 1 commit into from

Conversation

danobi
Copy link
Contributor

@danobi danobi commented Oct 21, 2022

It turns out if filecaps are assigned to a binary, it affects the processes "dumpability" (see PR_SET_DUMPABLE section in man prctl). When a process isn't globally dumpable, /proc/self/auxv is owned by root:root.

Put together, what this means is that if an executable uses this library and is assigned filecaps but run as a normal user, this library errors out when reading from /proc/self/auxv.

You can confirm this existing behavior by running this on master:

$ cd internal
$ go test -c
$ sudo setcap cap_net_admin,cap_sys_admin+p internal.test
$ ./internal.test
--- FAIL: TestCurrentKernelVersion (0.00s)
    version_test.go:53: opening auxv: open /proc/self/auxv: permission denied
FAIL

This commit makes KernelVersion() more reliable by adding a best-effort uname parsing codepath.

More details about the filecaps interaction is available here: https://dxuuu.xyz/filecaps.html .

It turns out if filecaps are assigned to a binary, it affects the
processes "dumpability" (see `PR_SET_DUMPABLE` section in `man prctl`).
When a process isn't globally dumpable, `/proc/self/auxv` is owned by
`root:root`.

Put together, what this means is that if an executable uses this library
and is assigned filecaps but run as a normal user, this library errors
out when reading from `/proc/self/auxv`.

You can confirm this existing behavior by running this on master:

    $ cd internal
    $ go test -c
    $ sudo setcap cap_net_admin,cap_sys_admin+p internal.test
    $ ./internal.test
    --- FAIL: TestCurrentKernelVersion (0.00s)
        version_test.go:53: opening auxv: open /proc/self/auxv: permission denied
    FAIL

This commit makes `KernelVersion()` more reliable by adding a
best-effort uname parsing codepath.

More details about the filecaps interaction is available here:
https://dxuuu.xyz/filecaps.html .
@lmb
Copy link
Collaborator

lmb commented Oct 21, 2022

Ah, that is unfortunate. We used to have uname parsing, but removed it in favour of auxv in #500. The old code first tried to read /proc/version_signature followed by uname to deal with Debian shenanigans. I researched additional tricks to get auxv, but there is no foolproof way. https://docs.rs/auxv/latest/auxv/#reading-the-auxiliary-vector has a great overview. We're better off with parsing uname.

I'm a bit unhappy that we'd have to maintain three different code paths for the same piece of information. Does it make sense to keep auxv parsing if we need uname anyways? Alternatively, could we do auxv + uname but skip /proc/version_signature?

If we decide to resurrect the old code we have to remove the dependency on regex, since that package is quite big.

Thanks for the nice write up as well :)

cc @brycekahle @ti-mo

@danobi
Copy link
Contributor Author

danobi commented Oct 21, 2022

I'm a bit unhappy that we'd have to maintain three different code paths for the same piece of information. Does it make sense to keep auxv parsing if we need uname anyways?

I think it's probably worthwhile. It looks like auxv works well in almost all circumstances. And if it doesn't work, rather than give up, could do a best effort uname parse.

That being said, I'm doubtful any distro would mess up the first maj.min.patch occurrence in the version string. So uname only would probably work. But in the interest of safety, I would personally probably not delete auxv stuff. B/c any regression in this code could occur at a very painful time -- in production with no way to fix without a new deployment. It can be quite hard to know in advance when custom/unknown kernels are in the mix. And in the current $DAYJOB that is a very expensive regression to make.

Thanks for the nice write up as well :)

👍

@brycekahle
Copy link
Contributor

@danobi does the problem persist, if you grant/elevate yourself to use the permitted capabilities?

@danobi
Copy link
Contributor Author

danobi commented Oct 21, 2022

@brycekahle yes: https://pastes.dxuuu.xyz/lgvdhu

$ gcc main.c -lcap
$ ./a.out
/proc/self/auxv inode=4019970
ok
$ sudo setcap cap_net_admin,cap_sys_admin+p internal.test
$ ./a.out
/proc/self/auxv inode=4023331
open: Permission denied

@brycekahle
Copy link
Contributor

@danobi Interesting. Is this the only thing failing when using filecaps?

@danobi
Copy link
Contributor Author

danobi commented Oct 21, 2022

@brycekahle yeah, at least for how my company is using bpf.

@ti-mo
Copy link
Collaborator

ti-mo commented Oct 24, 2022

I think I'm missing the bigger picture here. It's a cool war story, but is there a reason the process binary can't be made dumpable? The way I see it, it would be just another permission required, and the best course of action might be making the error clearer (opening auxv: open /proc/self/auxv: permission denied (process missing cap_<required_capability>). What is the required capability here? Unless I'm missing something, I don't think it was called out in the blog post explicitly.

Completely understand from a principle of least privilege, but I'm also interested in seeing some concrete arguments (apart from the surprising end result of auxv being root-owned) for the practical concerns of granting this privilege. Of course, giving an attacker less info is better, but consider the fact that this process already has CAP_BPF and CAP_NET_ADMIN in most cases, so this defense is of marginal added value at best.

This commit makes KernelVersion() more reliable

On the face of it, yes. But as I'll outline below, it also risks silently feeding users wrong version numbers if done incorrectly, as we're planning to use this value for extern kconfig LINUX_VERSION_CODE. This is also exactly the value that needs to be supplied to bpf_prog_load when loading kprobe-type progs on all kernels before 5.x. (thankfully removed in torvalds/linux@6c4fc209fcf9), so detection needs to be perfectly accurate.

As a reminder, there's also the patch version overflow problem that was introduced in the 4.19 series. The value in auxv was left clamped to 255 (or was it 254?) and is always guaranteed to match LINUX_VERSION_CODE. It's the most reliable method to obtain the macro's value from userspace, since distributions are not known to have messed with this value so far, which cannot be said for uname.


If we decide to go forward with this, can we just revive my old code instead? It's battle-tested and contained a good set of test strings as well. We can drop regex, of course.

That being said, I'm doubtful any distro would mess up the first maj.min.patch occurrence in the version string.

Let me introduce you to our good friend Debian:

λ ignite ~ → uname -r
5.10.0-16-amd64
λ ignite ~ → uname -v
#1 SMP Debian 5.10.127-1 (2022-06-30)
λ ignite ~ → uname -a
Linux ignite 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux

In bullseye, it looks like there's now a new file, /proc/version, that replaces the old /proc/version_signature:

λ ignite ~ → cat /proc/version
Linux version 5.10.0-16-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.127-1 (2022-06-30)

So yet another location to scan, although it seems like bullseye at least has the full version in uname -v (5.10.127), which wasn't the case in earlier releases. We could just stick to /proc/version_signature for compatibility with older kernels and go with uname for newer ones.

@ti-mo
Copy link
Collaborator

ti-mo commented Oct 24, 2022

As an aside, internal.KernelVersion() is only called when loading a Kprobe-type program. Unfortunately, it doesn't seem like we have the same condition in features.createProgLoadAttr(), but that's a simple fix. With that in place, you'd only have an auxv dependency when loading or feature-testing kprobes. In the future, extern kconfig LINUX_VERSION_CODE would be added to that list, but that can also be avoided if necessary.

Explicitly setting ProgramSpec.KernelVersion is also a way of skipping the function call.

@lmb
Copy link
Collaborator

lmb commented Oct 24, 2022

[...] is there a reason the process binary can't be made dumpable? [...] Completely understand from a principle of least privilege, but I'm also interested in seeing some concrete arguments (apart from the surprising end result of auxv being root-owned) for the practical concerns of granting this privilege.

I think you only need to call prtctl(PR_SET_DUMPABLE), no capabilities involved. The awkwardness here is that this allows PTRACE_ATTACH, which in turn essentially allows you to hijack the CAP_BPF target process. Rough outline:

  1. setcap p+bpf binary (as root)
  2. Execute ./binary. This process has the same uid as you, but CAP_BPF
  3. PTRACE_ATTACH to ./binary as unpriv user (only possible if PR_SET_DUMPABLE==1)
  4. Profit! You have control over privileged process

See https://www.kernel.org/doc/html/latest/admin-guide/LSM/Yama.html#ptrace-scope and https://man7.org/linux/man-pages/man2/ptrace.2.html

The other option would be to grant something like CAP_DAC_OVERRIDE, possibly CAP_DAC_READ_SEARCH too, since now the ownership of /proc/self/auxv is ignored. The problem is that the process can now read /etc/passwd and so on. It's a pretty big hammer.

If we decide to go forward with this, can we just revive my old code instead? It's battle-tested and contained a good set of test strings as well. We can drop regex, of course.

👍

@danobi
Copy link
Contributor Author

danobi commented Oct 24, 2022

@ti-mo thanks for the detailed explanation. Like @lmb suggests, I too feel a bit uneasy about requiring more capabilities than necessary to parse kernel version. But since capabilities are probably somewhat of a rare use case and it sounds annoying to maintain a bunch of mostly unused code, it might be fine to just ask for user to provide CAP_DAC_READ_SEARCH.

If we go that route, I think we should probably check for -EACCES and print a hint in the error message. Not everyone might be inclined to read the kernel source for as long as I did.

@lmb
Copy link
Collaborator

lmb commented Oct 25, 2022

@danobi would deploying with CAP_DAC_READ_SEARCH (or some other cap) be acceptable to you?

Another option could be to call into https://man7.org/linux/man-pages/man3/getauxval.3.html via CGo iff that is enabled. Again, a partial solution, but maybe better than requiring the additional cap.

@danobi
Copy link
Contributor Author

danobi commented Oct 25, 2022

@lmb Yeah, that ought to be fine for my use case.

lmb added a commit to lmb/ebpf that referenced this pull request Oct 25, 2022
…ilities

As reported [0] setting capabilities on an executable file will prevent
reading /proc/self/auxv since the file is owned by root. The work arounds to
this have various trade offs, so the best we can do is tell the user
why we failed and hope they check out our discussion or create an issue.

0: cilium#823
lmb added a commit to lmb/ebpf that referenced this pull request Oct 25, 2022
As reported [0] setting capabilities on an executable file will
prevent reading /proc/self/auxv since the file is owned by root. The
work arounds to this have various trade offs, so the best we can do
is tell the user why we failed and hope they check out our discussion
or create an issue.

0: cilium#823
lmb added a commit to lmb/ebpf that referenced this pull request Oct 25, 2022
As reported by Daniel Xu, setting capabilities on an executable file
will prevent reading /proc/self/auxv since the file is owned by root.
The work arounds to this have various trade offs, so the best we can
do is tell the user why we failed and hope they check out our
discussion or create an issue.

See cilium#823
@lmb
Copy link
Collaborator

lmb commented Oct 25, 2022

I made a PR with a hopefully better error message, let's see where that gets us.

@lmb lmb closed this Oct 25, 2022
ti-mo pushed a commit that referenced this pull request Oct 26, 2022
As reported by Daniel Xu, setting capabilities on an executable file
will prevent reading /proc/self/auxv since the file is owned by root.
The work arounds to this have various trade offs, so the best we can
do is tell the user why we failed and hope they check out our
discussion or create an issue.

See #823
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants