Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: os.checkPidfd() crashes with SIGSYS #69065

Open
cions opened this issue Aug 25, 2024 · 23 comments · May be fixed by #69245
Open

os: os.checkPidfd() crashes with SIGSYS #69065

cions opened this issue Aug 25, 2024 · 23 comments · May be fixed by #69245
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Android

Comments

@cions
Copy link

cions commented Aug 25, 2024

Go version

go version go1.23.0 android/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/data/data/com.termux/files/home/.cache/go-build'
GOENV='/data/data/com.termux/files/home/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='android'
GOINSECURE=''
GOMODCACHE='/data/data/com.termux/files/home/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='android'
GOPATH='/data/data/com.termux/files/home/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/data/data/com.termux/files/home/goroot'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/data/data/com.termux/files/home/goroot/pkg/tool/android_arm64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/data/data/com.termux/files/home/.config/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/data/data/com.termux/files/usr/tmp/go-build4148197892=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Just run go env (above output is by patched version)

What did you see happen?

$ uname -r
4.9.186-perf+
$ go version
go version go1.23.0 android/arm64
$ go env
SIGSYS: bad system call
PC=0x5f3e3b7700 m=0 sigcode=1

goroutine 1 gp=0x40000021c0 m=0 mp=0x5f3ef8db20 [syscall]:
syscall.Syscall(0x1b2, 0x3733, 0x0, 0x0)
        syscall/syscall_linux.go:73 +0x20 fp=0x400009a410 sp=0x400009a3b0 pc=0x5f3e3d02b0
internal/syscall/unix.PidFDOpen(0xac?, 0x0?)
        internal/syscall/unix/pidfd_linux.go:18 +0x2c fp=0x400009a440 sp=0x400009a410 pc=0x5f3e42593c
os.checkPidfd()
        os/pidfd_linux.go:139 +0x48 fp=0x400009a4f0 sp=0x400009a440 pc=0x5f3e437c38
os.init.OnceValue[...].func2()
        sync/oncefunc.go:57 +0x74 fp=0x400009a550 sp=0x400009a4f0 pc=0x5f3e430054
sync.(*Once).doSlow(0x20?, 0x43?)
        sync/once.go:76 +0xf8 fp=0x400009a5b0 sp=0x400009a550 pc=0x5f3e3bfbf8
sync.(*Once).Do(0x5?, 0x0?)
        sync/once.go:67 +0x24 fp=0x400009a5d0 sp=0x400009a5b0 pc=0x5f3e3bfad4
os.init.OnceValue[...].func3()
        sync/oncefunc.go:62 +0x3c fp=0x400009a610 sp=0x400009a5d0 pc=0x5f3e42ff9c
os.pidfdWorks(...)
        os/pidfd_linux.go:124
os.ensurePidfd(0x0)
        os/pidfd_linux.go:23 +0x2c fp=0x400009a650 sp=0x400009a610 pc=0x5f3e43741c
os.startProcess({0x400020b9c0, 0x3f}, {0x4000214000, 0x6, 0x6}, 0x400009a870)
        os/exec_posix.go:41 +0xb8 fp=0x400009a740 sp=0x400009a650 pc=0x5f3e4322f8
os.StartProcess({0x400020b9c0, 0x3f}, {0x4000214000, 0x6, 0x6}, 0x400009a870)
        os/exec.go:319 +0x50 fp=0x400009a780 sp=0x400009a740 pc=0x5f3e431fa0
os/exec.(*Cmd).Start(0x4000216000)
        os/exec/exec.go:709 +0x4ac fp=0x400009a910 sp=0x400009a780 pc=0x5f3e46c6dc
os/exec.(*Cmd).Run(0x4000216000)
        os/exec/exec.go:607 +0x20 fp=0x400009a930 sp=0x400009a910 pc=0x5f3e46c1f0
os/exec.(*Cmd).CombinedOutput(0x4000216000)
        os/exec/exec.go:1021 +0x84 fp=0x400009a960 sp=0x400009a930 pc=0x5f3e46d9e4
cmd/go/internal/work.(*Builder).gccToolID(0x400019c000, {0x40000e81a3, 0x1b}, {0x5f3e9a43d0, 0x1})
        cmd/go/internal/work/buildid.go:235 +0x340 fp=0x400009ab90 sp=0x400009a960 pc=0x5f3e7f6b10
cmd/go/internal/work.(*Builder).gccCompilerID(0x400019c000, {0x40000e81a3, 0x1b})
        cmd/go/internal/work/exec.go:2609 +0x3a8 fp=0x400009add0 sp=0x400009ab90 pc=0x5f3e80c418
cmd/go/internal/work.(*Builder).gccSupportsFlag(0x400019c000, {0x40000a0950, 0x40000e81a3?, 0x2?}, {0x5f3e89674c, 0x16})
        cmd/go/internal/work/exec.go:2483 +0x418 fp=0x400009afd0 sp=0x400009add0 pc=0x5f3e80b6f8
cmd/go/internal/work.(*Builder).compilerCmd(0x400019c000, {0x40000a0950, 0x1, 0x1}, {0x5f3e9a1d58?, 0x1?}, {0x0, 0x0})
        cmd/go/internal/work/exec.go:2362 +0x460 fp=0x400009b070 sp=0x400009afd0 pc=0x5f3e80ac00
cmd/go/internal/work.(*Builder).GccCmd(0x400019c000, {0x5f3e9a1d58, 0x1}, {0x0, 0x0})
        cmd/go/internal/work/exec.go:2305 +0x100 fp=0x400009b0e0 sp=0x400009b070 pc=0x5f3e80a5c0
cmd/go/internal/envcmd.ExtraEnvVarsCostly()
        cmd/go/internal/envcmd/env.go:223 +0xe0 fp=0x400009b830 sp=0x400009b0e0 pc=0x5f3e82ec60
cmd/go/internal/envcmd.runEnv({0x5f3ebf02f0?, 0x5f3efb4300?}, 0x40000de1e0?, {0x40000a4030, 0x0, 0x0})
        cmd/go/internal/envcmd/env.go:335 +0x574 fp=0x400009b9b0 sp=0x400009b830 pc=0x5f3e82f7b4
main.invoke(0x5f3ef7d940, {0x40000a4030, 0x1, 0x1})
        cmd/go/main.go:299 +0x674 fp=0x400009bcc0 sp=0x400009b9b0 pc=0x5f3e87f254
main.main()
        cmd/go/main.go:213 +0xdb4 fp=0x400009bf40 sp=0x400009bcc0 pc=0x5f3e87e884
runtime.main()
        runtime/proc.go:272 +0x288 fp=0x400009bfd0 sp=0x400009bf40 pc=0x5f3e373558
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009bfd0 sp=0x400009bfd0 pc=0x5f3e3b2e24

goroutine 17 gp=0x400008c380 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xc8 fp=0x4000054790 sp=0x4000054770 pc=0x5f3e3aac08
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.forcegchelper()
        runtime/proc.go:337 +0xb8 fp=0x40000547d0 sp=0x4000054790 pc=0x5f3e3738b8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000547d0 sp=0x40000547d0 pc=0x5f3e3b2e24
created by runtime.init.7 in goroutine 1
        runtime/proc.go:325 +0x24

goroutine 18 gp=0x400008c540 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xc8 fp=0x4000054f60 sp=0x4000054f40 pc=0x5f3e3aac08
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.bgsweep(0x40000a2000)
        runtime/mgcsweep.go:277 +0xa0 fp=0x4000054fb0 sp=0x4000054f60 pc=0x5f3e35e050
runtime.gcenable.gowrap1()
        runtime/mgc.go:203 +0x28 fp=0x4000054fd0 sp=0x4000054fb0 pc=0x5f3e352018
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000054fd0 sp=0x4000054fd0 pc=0x5f3e3b2e24
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:203 +0x6c

goroutine 19 gp=0x400008c700 m=nil [GC scavenge wait]:
runtime.gopark(0x40000a2000?, 0x5f3e9a1c68?, 0x1?, 0x0?, 0x400008c700?)
        runtime/proc.go:424 +0xc8 fp=0x4000055760 sp=0x4000055740 pc=0x5f3e3aac08
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.(*scavengerState).park(0x5f3ef8bce0)
        runtime/mgcscavenge.go:425 +0x5c fp=0x4000055790 sp=0x4000055760 pc=0x5f3e35ba7c
runtime.bgscavenge(0x40000a2000)
        runtime/mgcscavenge.go:653 +0x44 fp=0x40000557b0 sp=0x4000055790 pc=0x5f3e35bfa4
runtime.gcenable.gowrap2()
        runtime/mgc.go:204 +0x28 fp=0x40000557d0 sp=0x40000557b0 pc=0x5f3e351fb8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000557d0 sp=0x40000557d0 pc=0x5f3e3b2e24
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0xac

goroutine 20 gp=0x400008c8c0 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xc8 fp=0x4000055d80 sp=0x4000055d60 pc=0x5f3e3aac08
runtime.runfinq()
        runtime/mfinal.go:193 +0x108 fp=0x4000055fd0 sp=0x4000055d80 pc=0x5f3e351118
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000055fd0 sp=0x4000055fd0 pc=0x5f3e3b2e24
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:163 +0x80

goroutine 33 gp=0x400013e000 m=nil [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xc8 fp=0x40001446f0 sp=0x40001446d0 pc=0x5f3e3aac08
runtime.chanrecv(0x40001260e0, 0x0, 0x1)
        runtime/chan.go:639 +0x414 fp=0x4000144770 sp=0x40001446f0 pc=0x5f3e341664
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:489 +0x14 fp=0x40001447a0 sp=0x4000144770 pc=0x5f3e341214
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
        runtime/mgc.go:1732
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1735 +0x3c fp=0x40001447d0 sp=0x40001447a0 pc=0x5f3e3550fc
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40001447d0 sp=0x40001447d0 pc=0x5f3e3b2e24
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1730 +0xa0

r0      0x3733
r1      0x0
r2      0x0
r3      0x0
r4      0x0
r5      0x0
r6      0x0
r7      0x6
r8      0x1b2
r9      0x6
r10     0x0
r11     0x0
r12     0x400021a6a8
r13     0x14
r14     0x40000240c0
r15     0x4
r16     0x40000983a0
r17     0x400009a4c0
r18     0x77afbd2000
r19     0x8d6dd8b5d3e2f3f4
r20     0x400009a570
r21     0x7fcdfc11b8
r22     0x4000004000
r23     0x3cb9d69707
r24     0x8e41dc0be9fcd734
r25     0x0
r26     0x5f3ebe5578
r27     0x0
r28     0x40000021c0
r29     0x400009a348
lr      0x5f3e3d026c
sp      0x400009a350
pc      0x5f3e3b7700
fault   0x0

What did you expect to see?

According to https://go.dev/wiki/MinimumRequirements, Golang supports kernel version 2.6.32 or later, but os.checkPidfd() unconditionally calls pidfd_open(2), which was introduced in 5.3.

os.checkPidfd() should check availability without calling potentially unavailable system calls. Alternatively, allow to disable the use of pidfd by GODEBUG.

Related: #62654
CC @kolyshkin

@mauri870
Copy link
Member

mauri870 commented Aug 26, 2024

I find it weird that this crashes the process with SIGSYS. The way pidfd_open is implemented in linux it should catch the errno (ENOSYS):

pidfd, _, errno := syscall.Syscall(pidfdOpenTrap, uintptr(pid), uintptr(flags), 0)
if errno != 0 {
return ^uintptr(0), errno
}

The stacktrace shows it is crashing in runtime_entersyscall. Perhaps there is a seccomp(2) filter in place causing the process to receive a SIGSYS?

@mauri870 mauri870 added OS-Android NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 26, 2024
@ianlancetaylor
Copy link
Contributor

Yes, it would be extremely unfortunate if every unrecognized system call triggered a SIGSYS signal. That would make it impossible to write programs that run on both older and newer kernel versions. We need to understand what is causing that SIGSYS kernel. I don't think it is the kernel.

That said, I see that this is android. We may need to skip the pidfd calls on Android. CC @golang/android

@mauri870
Copy link
Member

mauri870 commented Aug 26, 2024

Additionally, would be good to see a strace output to aid with debugging, i.e strace -f go env.

@cions
Copy link
Author

cions commented Aug 27, 2024

I realized that the real culpit is seccomp.

$ grep Seccomp: /proc/self/status
Seccomp:        2
$ strace -fqq --signal=SIGSYS --trace=none go env
[pid 12717] --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0x5a46509700, si_syscall=__NR_pidfd_open, si_arch=AUDIT_ARCH_AARCH64} ---

So we should check if seccomp is enabled?

@ianlancetaylor
Copy link
Contributor

Checking if seccomp is enabled won't really help us, because we won't know the policy.

I think we should just skip the pidfd calls if GOOS == "android".

@mauri870
Copy link
Member

I think the safest approach is to disable pidfd on Android.

@mauri870
Copy link
Member

We can probably just make android use pidfd_other.go https://github.com/golang/go/blob/master/src/os/pidfd_other.go#L5

@cions
Copy link
Author

cions commented Aug 27, 2024

@ianlancetaylor
Copy link
Contributor

It seems to me that the kernel version isn't going to tell us anything about the seccomp policy.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/608518 mentions this issue: os: don't use pidfd functions on android

@cions
Copy link
Author

cions commented Aug 27, 2024

Not to check the seccomp policy, but to check if the kernel version supports pidfd.
Since Android with newer kernel would not have the problem, I don't think disabling pidfd for GOOS=android is a good idea.

@ianlancetaylor
Copy link
Contributor

Is there reason to believe that the seccomp policy matches the kernel version?

In normal Linux use we don't have to check the kernel version, because the system call with fail with an ENOSYS error. In this case the above discussion suggests that it is the Android seccomp policy that is sending the SIGSYS signal. But the kernel might have been updated without updating the seccomp policy. Is there a way that we can find out?

@cions
Copy link
Author

cions commented Aug 27, 2024

termux/termux-packages#21265
Users reported go works fine on newer Android

@ianlancetaylor
Copy link
Contributor

Thanks. I still don't know how to tell whether pidfd_open is supported or not. I'm certainly happy to accept a patch that has been tested on multiple versions of Android.

@mauri870
Copy link
Member

Any chance this was a bug with android? I found this aosp-mirror/platform_bionic@3de1915 but it points to a private issue.

@cions
Copy link
Author

cions commented Aug 28, 2024

https://android-review.googlesource.com/c/platform/bionic/+/1208625
https://cs.android.com/android/_/android/platform/bionic/+/refs/tags/android-11.0.0_r1:libc/SECCOMP_WHITELIST_COMMON.TXT;l=76
pidfd_open was added to seccomp allow list since Android 11

And
https://source.android.com/docs/core/architecture/kernel/generic-kernel-image#inhibits-platform-upgrades

Android 10 supports 3.18, 4.4, 4.9, 4.14, and 4.19 kernels

So, checking if the kernel version is 5.3 or newer before calling pidfd_open should work on all Android devices.

@cions
Copy link
Author

cions commented Aug 28, 2024

Oops, pidfd_send_signal was not allowed in Android 11, but it fixed in 12

Since Android 11 supports only 4.19 and 5.4 kernels (https://source.android.com/docs/core/architecture/kernel/android-common), change to check against 5.5 rather than 5.3

@kolyshkin
Copy link
Contributor

So, it's not the Android kernel but its seccomp policy which results in a process being killed (instead of something like returning ENOSYS). Apparently, this was fixed in Android 12, we can add a kludge to do a one-time runtime check for Android >= 12 and disable pidfd entirely if this requirement is not met, just to avoid being killed.

Alas I can't find any code that checks Android version in this repository.

@kolyshkin
Copy link
Contributor

This also means that there's no CI in place to test Android 11, or this would have been caught earlier.

@cions
Copy link
Author

cions commented Aug 28, 2024

You're right, we should check Android version >= 12 rather than kernel version.

Here is the way to get Android version in C (sorry I'm not familiar with Cgo)
https://gist.github.com/cions/07fa5f11e38945fa96916888b7e88d0c

@ianlancetaylor
Copy link
Contributor

Thanks. If we have to we can add a call to __system_property_get in runtime/cgo/gcc_android.c.

But honestly I think it would be simpler to just skip the pidfd calls on Android. Any patch to use them should be written by an Android developer who is able to test on various Android releases.

cions added a commit to cions/go that referenced this issue Sep 4, 2024
In Android version 11 and earlier, pidfd-related system calls are not
allowed by the seccomp policy, which causes crashes due to SIGSYS signals.

Fixes golang#69065
@cions cions linked a pull request Sep 4, 2024 that will close this issue
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/610515 mentions this issue: os: don't use pidfd on Android < 12

cions added a commit to cions/go that referenced this issue Sep 5, 2024
In Android version 11 and earlier, pidfd-related system calls are not
allowed by the seccomp policy, which causes crashes due to SIGSYS signals.

Fixes golang#69065
cions added a commit to cions/go that referenced this issue Sep 5, 2024
In Android version 11 and earlier, pidfd-related system calls are not
allowed by the seccomp policy, which causes crashes due to SIGSYS signals.

Fixes golang#69065
bradfitz added a commit to tailscale/go that referenced this issue Sep 12, 2024
Updates tailscale/tailscale#13452
Updates golang#69065

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit to tailscale/go that referenced this issue Sep 12, 2024
Updates tailscale/tailscale#13452
Updates golang#69065

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit to tailscale/go that referenced this issue Sep 12, 2024
Updates tailscale/tailscale#13452
Updates golang#69065

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Android
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants