Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mono crash in System.Net.Requests tests on linux-arm64 #74667

Closed
steveisok opened this issue Aug 26, 2022 · 10 comments · Fixed by #74994
Closed

Mono crash in System.Net.Requests tests on linux-arm64 #74667

steveisok opened this issue Aug 26, 2022 · 10 comments · Fixed by #74994
Labels
area-System.Net disabled-test The test is disabled in source code against the issue os-linux Linux OS (any supported distro) runtime-mono specific to the Mono runtime
Milestone

Comments

@steveisok
Copy link
Member

steveisok commented Aug 26, 2022

It appears that there is some trouble trying to reap child processes and that leads to a crash. Could it be possible we are timing out waiting for a request to finish?

Example log

Process terminated.
Error while reaping child. errno = 10

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000007f92b672a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5589c036fc) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88	../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
  Id   Target Id         Frame 
* 1    Thread 0x7f92ba5ff0 (LWP 25) "dotnet" 0x0000007f92b672a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5589c036fc) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  2    Thread 0x7f91bff1c0 (LWP 26) "SGen worker" 0x0000007f92b672a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f926008c8 <work_cond+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  3    Thread 0x7f8fe0f1c0 (LWP 27) "dotnet" 0x0000007f92841ef8 in __GI___poll (fds=0x7f88003af0, nfds=547913473925, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:41
  4    Thread 0x7f8fc0e1c0 (LWP 28) "Finalizer" 0x0000007f92b69a40 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7f925f19c0 <finalizer_sem>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  5    Thread 0x7f8ccd51c0 (LWP 29) "dotnet" 0x0000007f92b6bd5c in __waitpid (pid=<optimized out>, stat_loc=0x7f8ccd22b0, options=<optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:30

Lot of other threads are waiting similar to

Thread 14 (Thread 0x7f76bf91c0 (LWP 40)):
#0  0x0000007f92b675bc in futex_reltimed_wait_cancelable (private=<optimized out>, reltime=0x7f00000000, expected=0, futex_word=0x7f5c00547c) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
#1  __pthread_cond_wait_common (abstime=0x7f76bf3fc8, mutex=0x7f5c005420, cond=0x7f5c005450) at pthread_cond_wait.c:533
#2  __pthread_cond_timedwait (cond=0x7f5c005450, mutex=0x7f5c005420, abstime=0x7f76bf3fc8) at pthread_cond_wait.c:667
#3  0x0000007f8f768490 in SystemNative_LowLevelMonitor_TimedWait (monitor=0x7f5c005420, timeoutMilliseconds=30000) at /__w/1/s/src/native/libs/System.Native/pal_threading.c:195
#4  0x0000007f8c1f958c in ?? ()
#5  0x0000007f8cbc52c8 in ?? ()
#6  0x0000007f8fe7b1a0 in ?? ()
#7  0xf90047bda907f3bb in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
{ "ErrorMessage":"Error while reaping child. errno = 10" } 

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 26, 2022
@ghost
Copy link

ghost commented Aug 26, 2022

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

It appears that there is some trouble trying to reap child processes and that leads to a crash. Could it be possible we are timing out waiting for a request to finish?

Example log

Process terminated.
Error while reaping child. errno = 10

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000007f92b672a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5589c036fc) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88	../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
  Id   Target Id         Frame 
* 1    Thread 0x7f92ba5ff0 (LWP 25) "dotnet" 0x0000007f92b672a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5589c036fc) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  2    Thread 0x7f91bff1c0 (LWP 26) "SGen worker" 0x0000007f92b672a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f926008c8 <work_cond+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  3    Thread 0x7f8fe0f1c0 (LWP 27) "dotnet" 0x0000007f92841ef8 in __GI___poll (fds=0x7f88003af0, nfds=547913473925, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:41
  4    Thread 0x7f8fc0e1c0 (LWP 28) "Finalizer" 0x0000007f92b69a40 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7f925f19c0 <finalizer_sem>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  5    Thread 0x7f8ccd51c0 (LWP 29) "dotnet" 0x0000007f92b6bd5c in __waitpid (pid=<optimized out>, stat_loc=0x7f8ccd22b0, options=<optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:30

Lot of other threads are waiting similar to

Thread 14 (Thread 0x7f76bf91c0 (LWP 40)):
#0  0x0000007f92b675bc in futex_reltimed_wait_cancelable (private=<optimized out>, reltime=0x7f00000000, expected=0, futex_word=0x7f5c00547c) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
#1  __pthread_cond_wait_common (abstime=0x7f76bf3fc8, mutex=0x7f5c005420, cond=0x7f5c005450) at pthread_cond_wait.c:533
#2  __pthread_cond_timedwait (cond=0x7f5c005450, mutex=0x7f5c005420, abstime=0x7f76bf3fc8) at pthread_cond_wait.c:667
#3  0x0000007f8f768490 in SystemNative_LowLevelMonitor_TimedWait (monitor=0x7f5c005420, timeoutMilliseconds=30000) at /__w/1/s/src/native/libs/System.Native/pal_threading.c:195
#4  0x0000007f8c1f958c in ?? ()
#5  0x0000007f8cbc52c8 in ?? ()
#6  0x0000007f8fe7b1a0 in ?? ()
#7  0xf90047bda907f3bb in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Author: steveisok
Assignees: -
Labels:

area-System.Threading

Milestone: -

@steveisok steveisok added runtime-mono specific to the Mono runtime os-linux Linux OS (any supported distro) and removed untriaged New issue has not been triaged by the area owner labels Aug 26, 2022
steveisok pushed a commit to steveisok/runtime that referenced this issue Aug 26, 2022
The suite crashes on mono linux-arm64 and is being tracked by dotnet#74667
@ulisesh ulisesh added the Known Build Error Use this to report build issues in the .NET Helix tab label Aug 29, 2022
steveisok added a commit that referenced this issue Aug 30, 2022
The suite crashes on mono linux-arm64 and is being tracked by #74667
@jkotas jkotas added disabled-test The test is disabled in source code against the issue and removed Known Build Error Use this to report build issues in the .NET Helix tab labels Aug 30, 2022
@ghost
Copy link

ghost commented Aug 30, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

It appears that there is some trouble trying to reap child processes and that leads to a crash. Could it be possible we are timing out waiting for a request to finish?

Example log

Process terminated.
Error while reaping child. errno = 10

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000007f92b672a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5589c036fc) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88	../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
  Id   Target Id         Frame 
* 1    Thread 0x7f92ba5ff0 (LWP 25) "dotnet" 0x0000007f92b672a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5589c036fc) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  2    Thread 0x7f91bff1c0 (LWP 26) "SGen worker" 0x0000007f92b672a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f926008c8 <work_cond+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  3    Thread 0x7f8fe0f1c0 (LWP 27) "dotnet" 0x0000007f92841ef8 in __GI___poll (fds=0x7f88003af0, nfds=547913473925, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:41
  4    Thread 0x7f8fc0e1c0 (LWP 28) "Finalizer" 0x0000007f92b69a40 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7f925f19c0 <finalizer_sem>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  5    Thread 0x7f8ccd51c0 (LWP 29) "dotnet" 0x0000007f92b6bd5c in __waitpid (pid=<optimized out>, stat_loc=0x7f8ccd22b0, options=<optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:30

Lot of other threads are waiting similar to

Thread 14 (Thread 0x7f76bf91c0 (LWP 40)):
#0  0x0000007f92b675bc in futex_reltimed_wait_cancelable (private=<optimized out>, reltime=0x7f00000000, expected=0, futex_word=0x7f5c00547c) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
#1  __pthread_cond_wait_common (abstime=0x7f76bf3fc8, mutex=0x7f5c005420, cond=0x7f5c005450) at pthread_cond_wait.c:533
#2  __pthread_cond_timedwait (cond=0x7f5c005450, mutex=0x7f5c005420, abstime=0x7f76bf3fc8) at pthread_cond_wait.c:667
#3  0x0000007f8f768490 in SystemNative_LowLevelMonitor_TimedWait (monitor=0x7f5c005420, timeoutMilliseconds=30000) at /__w/1/s/src/native/libs/System.Native/pal_threading.c:195
#4  0x0000007f8c1f958c in ?? ()
#5  0x0000007f8cbc52c8 in ?? ()
#6  0x0000007f8fe7b1a0 in ?? ()
#7  0xf90047bda907f3bb in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
{ "ErrorMessage":"../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory." } 

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
Author: steveisok
Assignees: -
Labels:

area-System.Net, disabled-test, os-linux, runtime-mono

Milestone: -

@wfurt
Copy link
Member

wfurt commented Aug 30, 2022

this is not mono specific (even if mono failures seem prevalent) see #74795

@steveisok
Copy link
Member Author

this is not mono specific (even if mono failures seem prevalent) see #74795

Thanks! I wondered but couldn't find any evidence.

@wfurt
Copy link
Member

wfurt commented Aug 30, 2022

is there some way how to run the tests with Mono and dump the managed objects @steveisok? I would like to see if this is happening during some particular test....

@steveisok
Copy link
Member Author

is there some way how to run the tests with Mono and dump the managed objects @steveisok? I would like to see if this is happening during some particular test....

I'm not sure. @akoeplinger would you happen to know?

@akoeplinger
Copy link
Member

If you have the crash in lldb/gdb then https://www.mono-project.com/docs/debug+profile/debug/#debugging-with-gdb might help, maybe @BrzVlad knows a better way.

The only other quick suggestion I have to pinpoint the exact test is to write a custom xunit v2 assembly runner which logs when a test starts by subscribing to runner.OnTestStarting and setting the parallel parameter to false in the runner.Start() method: https://github.com/xunit/samples.xunit/blob/main/TestRunner/Program.cs

@wfurt
Copy link
Member

wfurt commented Aug 31, 2022

We could disable parallelization as workaround for now. Any thoughts on that @karelz? Its seems better than disabling the tests....

@karelz
Copy link
Member

karelz commented Sep 1, 2022

I think this is showing in CI now quite a bit (I am behind on Test Monitor duty), in which case it is blocking CI and therefore any action to make it not a problem is good.
We can go after the root cause in parallel.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 2, 2022
akoeplinger added a commit that referenced this issue Sep 4, 2022
…4994)

The test was using `RemoteExecutor.Invoke(Action<string, string, string, string> method, ...)` since there is no overload that takes `Func<string, string, string, string, Task>` (only one with three strings) and that means it's becoming an async void.

The delegate that gets invoked will return the moment the method awaits something not yet completed, so now there's a race condition, where `RemoteExecutor.Invoke` thinks all work is done, but there's still likely work running and it'll start doing all its cleanup stuff like killing child processes.

Fix by removing one string parameter so it picks the correct overload. I'll also open an arcade PR to add an overload with four string arguments.

Fixes #74667
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 4, 2022
@akoeplinger
Copy link
Member

Turns out this is still happening, will re-disable the tests and let's continue investigating in #74795 since it's the same issue

@ghost ghost locked as resolved and limited conversation to collaborators Oct 5, 2022
@karelz karelz added this to the 8.0.0 milestone Mar 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net disabled-test The test is disabled in source code against the issue os-linux Linux OS (any supported distro) runtime-mono specific to the Mono runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants