Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TestExclusionList in bringup_runtest.sh #106201

Merged
merged 2 commits into from
Sep 20, 2024
Merged

Conversation

am11
Copy link
Member

@am11 am11 commented Aug 9, 2024

See #97791 (comment).

TestExclusionList.txt is already written in directory pointed by $CORE_ROOT , lets use that in addition to existing mechanism (which I don't think are required but I haven't cleaned them up in case someone is using them).

Tested on riscv64 machine that it does skip the tests pointed by the txt file.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 9, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 9, 2024
@am11
Copy link
Member Author

am11 commented Aug 9, 2024

cc @gbalykov

this directory doesn't have src/ subpath under it:

testNativeBinDir=$testNativeBinDir/src
if [ ! -d "$testNativeBinDir" ]; then
exit_with_error "$errorSource" "Directory specified by --testNativeBinDir does not exist: $testNativeBinDir"
fi

is it some kind of a leftover? I have removed that block to run tests locally, can push the change here if someone could confirm.

@am11 am11 added area-Infrastructure-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 9, 2024
@gbalykov
Copy link
Member

is it some kind of a leftover?

@am11 sorry for late response, I think so. Actually, in case of riscv64 cross build artifacts/obj/Linux.riscv64.Release/tests dir doesn't exist, so maybe it should be artifacts/tests/coreclr/linux.riscv64.Release/bin/.

@am11
Copy link
Member Author

am11 commented Aug 20, 2024

I think so.

OK, I've deleted it. Thanks for confirming.

Actually, in case of riscv64 cross build artifacts/obj/Linux.riscv64.Release/tests dir doesn't exist, so maybe it should be artifacts/tests/coreclr/linux.riscv64.Release/bin/.

Sorry, I didn't follow. Could you explain / point me to it? With the current state of this PR (after the second commit), this script should just run with tests exclusion effective. I tested with these artifacts: https://github.com/am11/CrossRepoCITesting/releases/tag/linux-riscv64_10439230608

$ cat  artifacts_part_aa   artifacts_part_aab   artifacts_part_ac > artifacts.tar.gz
# now untar artifacts.tar.gz and run tests

it was created by this workflow https://github.com/am11/CrossRepoCITesting/actions/runs/10439230608/workflow (I basically used the steps you shared once)

@gbalykov
Copy link
Member

Sorry, I didn't follow. Could you explain / point me to it?

I mean that example value of testNativeBinDir doesn't seem to exist now (at least there's no similar dir for riscv cross build). So can you share which dir you pass to --testNativeBinDir=?

echo ' --testNativeBinDir="runtime/artifacts/obj/Linux.x64.Debug/tests"'

@am11
Copy link
Member Author

am11 commented Aug 21, 2024

I have a run-tests.sh next to runtime/ (so it is not part of the git repo):

#!/bin/sh

root="$( cd -P "$( dirname "$0" )" && pwd )/runtime"

"$root"/src/tests/Common/scripts/bringup_runtest.sh \
  --testRootDir="$root"/artifacts/tests/coreclr/linux.riscv64.Release \
  --testNativeBinDir="$root"/artifacts/tests/coreclr/linux.riscv64.Release/bin \
  --coreClrBinDir="$root"/artifacts/bin/coreclr/linux.riscv64.Release

I agree that the paths in help text are outdated.

@am11
Copy link
Member Author

am11 commented Aug 24, 2024

@gbalykov, the generated file from the latest main artifacts https://github.com/am11/CrossRepoCITesting/releases/tag/linux-riscv64_10541101133:

TestExclusionList.txt

it lists 76 tests to skip. Does it look correct (based on src/tests/issues.targets)? I will share the details later on status issue, but seems like despite this working, some tests are failing and some are hanging indefinitely. I will keep adding them to src/tests/issues.targets after which TestExclusionList.txt gets automatically updated as part of the test build.

@gbalykov
Copy link
Member

Does it look correct (based on src/tests/issues.targets)?

I've briefly looked through it and seems correct in overall. Though, it seems to have some extra unneeded lines, e.g.

GC/LargeMemory/API/gc/getgeneration/getgeneration.dll,https://github.com/dotnet/runtime/issues/5933
GC/LargeMemory/API/gc/getgeneration/largeobject.dll,https://github.com/dotnet/runtime/issues/5933

Both are part of GC/LargeMemory/API/gc/getgeneration and largeobject.dll is an auxiliary lib.

but seems like despite this working, some tests are failing and some are hanging indefinitely

We've been running clr tests for a long time now on main in default/jitstress/gcstress modes on VisionFive2 and results were pretty much stable. Though, not all fails are added to src/tests/issues.targets, yet all of them should be skipped anyway for one reason or another. Also we run tests built with BuildAllTestsAsStandalone=true.

Please share more details when you are ready.

cc @dotnet/samsung

@am11
Copy link
Member Author

am11 commented Aug 25, 2024

Running tests again. Here is some basic info:
image

It's bianbu OS, debian derivate, so I used Ubuntu 24.10 repo to install lldb-18-dev.

/boot/config-$(uname -r): https://0x0.st/Xyno.txt (you can compare with that of Vision5)

other random info:

$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 61962
max locked memory           (kbytes, -l) 2033068
max memory size             (kbytes, -m) unlimited
open files                          (-n) 14096
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 16384
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 61962
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

$ sysctl vm.overcommit_memory
vm.overcommit_memory = 1

$ sysctl vm.swappiness
vm.swappiness = 60

I get failing tests like this:

               BEGIN EXECUTION
               /home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/coreoverlay/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true keepalivedirectedgraph.dll ''
               Test should pass with ExitCode 100
               Building Graph with 100 vertices...
               Building Vertices...
               Building Edges...
               Making all vertices reachable...
               Deleting all vertices...
               ./keepalivedirectedgraph.sh: line 448: 169376 Segmentation fault      (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
               Expected: 100
               Actual: 139
               END EXECUTION - FAILED
         - GC/Regressions/v2.0-beta2/462651/462651/462651.sh
         - GC/Regressions/v2.0-beta1/289745/289745/289745.sh
         - GC/Regressions/v2.0-beta2/426480/426480/426480.sh
         - GC/Regressions/v2.0-beta2/445488/445488/445488.sh
         - GC/Regressions/v2.0-beta2/452950/452950/452950.sh
         - GC/Scenarios/BaseFinal/basefinal/basefinal.sh
         - GC/Regressions/v2.0-beta2/471729/471729/471729.sh
         - GC/Regressions/v3.0/25252/25252/25252.sh
         - GC/Scenarios/BinTree/thdtree/thdtree.sh
         - GC/Scenarios/Boxing/arrcpy/arrcpy.sh
         - GC/Scenarios/Boxing/gcvariant2/gcvariant2.sh
FAILED   - GC/Features/SustainedLowLatency/scenario/scenario.sh

but if I run it manually in the same terminal window, it passes:

am11@k1:~/projects/runtime$ pushd artifacts/tests/coreclr/linux.riscv64.Release/GC/Features/KeepAlive/keepaliveother/keepalivedirectedgraph
~/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/GC/Features/KeepAlive/keepaliveother/keepalivedirectedgraph ~/projects/runtime
am11@k1:~/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/GC/Features/KeepAlive/keepaliveother/keepalivedirectedgraph$ ./keepalivedirectedgraph.sh -coreroot=/home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root
BEGIN EXECUTION
/home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true keepalivedirectedgraph.dll ''
Test should pass with ExitCode 100
Building Graph with 100 vertices...
Building Vertices...
Building Edges...
Making all vertices reachable...
Deleting all vertices...
Done...
Expected: 100
Actual: 100
END EXECUTION - PASSED

I have tried 5-6 similar sigsegv cases, and all of them pass when I run them manually. It's a testing machine so I am not running much extra stuff. Still it has that GNOME DE running with some services etc. I can switch to text-mode or console mode if that's what might be causing these..

There are other kind of failures as well, e.g.

               BEGIN EXECUTION
               /home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/coreoverlay/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true stackoverflowtester.dll ''
               Running stackoverflow test(smallframe main)
               "Stack overflow."
               "Repeated 349283 times:"
               "--------------------------------"
               "   at TestStackOverflow.Program.InfiniteRecursionC()"
               "   at TestStackOverflow.Program.InfiniteRecursionB()"
               "   at TestStackOverflow.Program.InfiniteRecursionA()"
               "--------------------------------"
               "   at TestStackOverflow.Program.Test(Boolean)"
               "   at TestStackOverflow.Program.Main(System.String[])"
               ""
               Running stackoverflow test(largeframe main)
               "Stack overflow."
               "Repeated 85 times:"
               "--------------------------------"
               "   at TestStackOverflow.Program.InfiniteRecursionA2()"
               "   at TestStackOverflow.Program.InfiniteRecursionC2()"
               "   at TestStackOverflow.Program.InfiniteRecursionB2()"
               "--------------------------------"
               "   at TestStackOverflow.Program.InfiniteRecursionA2()"
               "   at TestStackOverflow.Program.Test(Boolean)"
               "   at TestStackOverflow.Program.Main(System.String[])"
               ""
               Running stackoverflow test(smallframe secondary)
               "Stack overflow."
               "Repeated 349378 times:"
               "--------------------------------"
               "   at TestStackOverflow.Program.InfiniteRecursionC()"
               "   at TestStackOverflow.Program.InfiniteRecursionB()"
               "   at TestStackOverflow.Program.InfiniteRecursionA()"
               "--------------------------------"
               "   at TestStackOverflow.Program.Test(Boolean)"
               "   at TestStackOverflow.Program+<>c__DisplayClass7_0.<SecondaryThreadsTest>b__0()"
               "   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)"
               ""
               Running stackoverflow test(largeframe secondary)
               ""
               System.Exception: Exit code: 0x0000008B, expected 0x00000086
                  at TestStackOverflow.Program.TestStackOverflow(String testName, String testArgs, List`1& stderrLines) in /runtime/src/tests/baseservices/exceptions/stackoverflow/stackoverflowtester.cs:line 75
                  at TestStackOverflow.Program.TestStackOverflowLargeFrameSecondaryThread() in /runtime/src/tests/baseservices/exceptions/stackoverflow/stackoverflowtester.cs:line 193
                  at __GeneratedMainWrapper.Main() in /runtime/artifacts/tests/coreclr/obj/linux.riscv64.Release/Managed/baseservices/exceptions/stackoverflow/stackoverflowtester/XUnitWrapperGenerator/XUnitWrapperGenerator.XUnitWrapperGenerator/SimpleRunner.g.cs:line 10
               Expected: 100
               Actual: 101
               END EXECUTION - FAILED

This also passes when I run stackoverflowtester.sh manually.

I then passed --sequential to the bringup script to disable parallel execution and found reproducible failure:

am11@k1:~/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/GC/Coverage/LargeObjectAlloc$ lldb-18 -- /home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true LargeObjectAlloc.dll '1'
(lldb) target create "/home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root/corerun"
Current executable set to '/home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root/corerun' (riscv64).
(lldb) settings set -- target.run-args  "-p" "System.Reflection.Metadata.MetadataUpdater.IsSupported=false" "-p" "System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true" "LargeObjectAlloc.dll" "1"
(lldb) r
Process 218168 launched: '/home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root/corerun' (riscv64)
LargeObjectAlloc started with 1 threads. Control-C to exit
All threads started
0: Restarting run 0
Process 218168 stopped
* thread #9, name = '0', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: 0x0000003ff7c1b282 libc.so.6`__futex_abstimed_wait_common [inlined] __futex_abstimed_wait_common64(private=<unavailable>, cancel=<unavailable>, abstime=0x0000000000000000, op=<unavailable>, expected=<unavailable>, futex_word=0x0000002aaab40fe8) at futex-internal.c:57:12
(lldb) bt
* thread #9, name = '0', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x0000003ff7c1b282 libc.so.6`__futex_abstimed_wait_common [inlined] __futex_abstimed_wait_common64(private=<unavailable>, cancel=<unavailable>, abstime=0x0000000000000000, op=<unavailable>, expected=<unavailable>, futex_word=0x0000002aaab40fe8) at futex-internal.c:57:12
    frame #1: 0x0000003ff7c1b26a libc.so.6`__futex_abstimed_wait_common(futex_word=0x0000002aaab40fe8, expected=<unavailable>, clockid=<unavailable>, abstime=0x0000000000000000, private=<unavailable>, cancel=<unavailable>) at futex-internal.c:87:9
    frame #2: 0x0000003ff7c1d072 libc.so.6`___pthread_cond_wait [inlined] __pthread_cond_wait_common(abstime=0x0000000000000000, clockid=0, mutex=0x0000002aaab40ff0, cond=0x0000002aaab40fc0) at pthread_cond_wait.c:503:10
    frame #3: 0x0000003ff7c1d016 libc.so.6`___pthread_cond_wait(cond=0x0000002aaab40fc0, mutex=0x0000002aaab40ff0) at pthread_cond_wait.c:627:10
    frame #4: 0x0000003ff7aadd62 libcoreclr.so`GCEvent::Impl::Wait(this=0x0000002aaab40fc0, milliseconds=<unavailable>, alertable=<unavailable>) at events.cpp:149:22 [opt]
    frame #5: 0x0000003ff79531d6 libcoreclr.so`WKS::gc_heap::try_allocate_more_space(alloc_context*, unsigned long, unsigned int, int) [inlined] WKS::gc_heap::wait_for_gc_done(timeOut=-1) at gc.cpp:14729:49 [opt]
    frame #6: 0x0000003ff79531ac libcoreclr.so`WKS::gc_heap::try_allocate_more_space(acontext=<unavailable>, size=<unavailable>, flags=<unavailable>, gen_number=<unavailable>) at gc.cpp:18939:9 [opt]
    frame #7: 0x0000003ff79763f6 libcoreclr.so`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) [inlined] WKS::gc_heap::allocate_more_space(acontext=0x0000003f0e22d180, size=6400056, flags=32, alloc_generation_number=3) at gc.cpp:19530:18 [opt]
    frame #8: 0x0000003ff79763e8 libcoreclr.so`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) [inlined] WKS::gc_heap::allocate_uoh_object(jsize=6400024, flags=32, gen_number=<unavailable>, alloc_bytes=0x0000003f0e22e9d0) at gc.cpp:45049:11 [opt]
    frame #9: 0x0000003ff79763d4 libcoreclr.so`WKS::GCHeap::Alloc(this=<unavailable>, context=0x0000003f0e22e9b8, size=6400024, flags=32) at gc.cpp:49563:30 [opt]
    frame #10: 0x0000003ff7850f36 libcoreclr.so`Alloc(size=6400024, flags=GC_ALLOC_LARGE_OBJECT_HEAP) at gchelpers.cpp:227:48 [opt]
    frame #11: 0x0000003ff7850ce0 libcoreclr.so`AllocateSzArray(pArrayMT=0x0000003f798b8d38, cElements=1600000, flags=<unavailable>) at gchelpers.cpp:0 [opt]
    frame #12: 0x0000003ff786422e libcoreclr.so`JIT_NewArr1(arrayMT=0x0000003f798b8d38, size=<unavailable>) at jithelpers.cpp:1571:16 [opt]
    frame #13: 0x0000003f798dedb4

@gbalykov
Copy link
Member

I get failing tests like this:

  • keepalivedirectedgraph doesn't seem to fail in any of our launches on main on VisionFive2 with Debian provided by StarFive, looks strange
  • stackoverflowtester should probably be fixed with Disabled stackoverflow tests with largeframe for RISCV64 #106383 by @SzpejnaDawid
  • LargeObjectAlloc and tests like it usually require a lot of memory, maybe even 16 Gb is not enough, yet I also don't see it in failures on 4Gb VisionFive2 in our test runs

Also, I've heard that there're some issues with at least Banana Pi Bpi-f3 (e.g. https://forum.banana-pi.org/t/banana-pi-f3-with-16-gb-ram-constantly-freezing-solved/18678, https://www.reddit.com/r/RISCV/comments/1en1eb3/banana_pi_f3_with_16_gb_ram_constantly_freezing/), maybe your issue is smth like it since SpacemiT K1 and M1 are very similar as I understand. So bianbu os update might help.

@am11
Copy link
Member Author

am11 commented Sep 20, 2024

@jkoritzinsky, feel free to merge this. Our discussion is a bit off-topic. 😅

@gbalykov, I am not sure if that freezing issue is related since this system, with the whole DE, is running for weeks now. However, I will try running priority tests again once Bianbu 2.0 is out of RC series (https://bianbu-linux.spacemit.com/en/release_notes/bl-v2.0.y/). That one has kernel 6.6. They are also upstreaming their kernel patches these days https://google.com/search?q=spacemit+site:lore.kernel.org, which may take some time after which I'd be able to try it on other distros (3 M lines of patches in Bianbu v2, compared to mainline's 6.6, so I'm not going to try to apply it on 6.10 or 11 myself 🙈).

@jkoritzinsky jkoritzinsky merged commit f968980 into dotnet:main Sep 20, 2024
70 checks passed
@am11 am11 deleted the patch-10 branch September 20, 2024 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Infrastructure-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants