Use TestExclusionList in bringup_runtest.sh #106201

am11 · 2024-08-09T17:29:32Z

TestExclusionList.txt is already written in directory pointed by $CORE_ROOT , lets use that in addition to existing mechanism (which I don't think are required but I haven't cleaned them up in case someone is using them).

Tested on riscv64 machine that it does skip the tests pointed by the txt file.

am11 · 2024-08-09T17:31:38Z

cc @gbalykov

this directory doesn't have src/ subpath under it:

runtime/src/tests/Common/scripts/bringup_runtest.sh

Lines 498 to 501 in 1a944fb

    
           testNativeBinDir=$testNativeBinDir/src 
        
           if [ ! -d "$testNativeBinDir" ]; then 
        
               exit_with_error "$errorSource" "Directory specified by --testNativeBinDir does not exist: $testNativeBinDir" 
        
           fi

is it some kind of a leftover? I have removed that block to run tests locally, can push the change here if someone could confirm.

gbalykov · 2024-08-20T19:41:47Z

is it some kind of a leftover?

@am11 sorry for late response, I think so. Actually, in case of riscv64 cross build artifacts/obj/Linux.riscv64.Release/tests dir doesn't exist, so maybe it should be artifacts/tests/coreclr/linux.riscv64.Release/bin/.

am11 · 2024-08-20T23:39:32Z

I think so.

OK, I've deleted it. Thanks for confirming.

Actually, in case of riscv64 cross build artifacts/obj/Linux.riscv64.Release/tests dir doesn't exist, so maybe it should be artifacts/tests/coreclr/linux.riscv64.Release/bin/.

Sorry, I didn't follow. Could you explain / point me to it? With the current state of this PR (after the second commit), this script should just run with tests exclusion effective. I tested with these artifacts: https://github.com/am11/CrossRepoCITesting/releases/tag/linux-riscv64_10439230608

$ cat  artifacts_part_aa   artifacts_part_aab   artifacts_part_ac > artifacts.tar.gz
# now untar artifacts.tar.gz and run tests

it was created by this workflow https://github.com/am11/CrossRepoCITesting/actions/runs/10439230608/workflow (I basically used the steps you shared once)

gbalykov · 2024-08-21T08:44:38Z

Sorry, I didn't follow. Could you explain / point me to it?

I mean that example value of testNativeBinDir doesn't seem to exist now (at least there's no similar dir for riscv cross build). So can you share which dir you pass to --testNativeBinDir=?

runtime/src/tests/Common/scripts/bringup_runtest.sh

Line 11 in 1a944fb

echo ' --testNativeBinDir="runtime/artifacts/obj/Linux.x64.Debug/tests"'

am11 · 2024-08-21T09:24:27Z

I have a run-tests.sh next to runtime/ (so it is not part of the git repo):

#!/bin/sh

root="$( cd -P "$( dirname "$0" )" && pwd )/runtime"

"$root"/src/tests/Common/scripts/bringup_runtest.sh \
  --testRootDir="$root"/artifacts/tests/coreclr/linux.riscv64.Release \
  --testNativeBinDir="$root"/artifacts/tests/coreclr/linux.riscv64.Release/bin \
  --coreClrBinDir="$root"/artifacts/bin/coreclr/linux.riscv64.Release

I agree that the paths in help text are outdated.

am11 · 2024-08-24T21:19:51Z

@gbalykov, the generated file from the latest main artifacts https://github.com/am11/CrossRepoCITesting/releases/tag/linux-riscv64_10541101133:

TestExclusionList.txt

it lists 76 tests to skip. Does it look correct (based on src/tests/issues.targets)? I will share the details later on status issue, but seems like despite this working, some tests are failing and some are hanging indefinitely. I will keep adding them to src/tests/issues.targets after which TestExclusionList.txt gets automatically updated as part of the test build.

gbalykov · 2024-08-25T09:27:27Z

Does it look correct (based on src/tests/issues.targets)?

I've briefly looked through it and seems correct in overall. Though, it seems to have some extra unneeded lines, e.g.

GC/LargeMemory/API/gc/getgeneration/getgeneration.dll,https://github.com/dotnet/runtime/issues/5933
GC/LargeMemory/API/gc/getgeneration/largeobject.dll,https://github.com/dotnet/runtime/issues/5933

Both are part of GC/LargeMemory/API/gc/getgeneration and largeobject.dll is an auxiliary lib.

but seems like despite this working, some tests are failing and some are hanging indefinitely

We've been running clr tests for a long time now on main in default/jitstress/gcstress modes on VisionFive2 and results were pretty much stable. Though, not all fails are added to src/tests/issues.targets, yet all of them should be skipped anyway for one reason or another. Also we run tests built with BuildAllTestsAsStandalone=true.

Please share more details when you are ready.

cc @dotnet/samsung

am11 · 2024-08-25T12:18:08Z

Running tests again. Here is some basic info:

It's bianbu OS, debian derivate, so I used Ubuntu 24.10 repo to install lldb-18-dev.

/boot/config-$(uname -r): https://0x0.st/Xyno.txt (you can compare with that of Vision5)

other random info:

$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 61962
max locked memory           (kbytes, -l) 2033068
max memory size             (kbytes, -m) unlimited
open files                          (-n) 14096
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 16384
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 61962
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

$ sysctl vm.overcommit_memory
vm.overcommit_memory = 1

$ sysctl vm.swappiness
vm.swappiness = 60

I get failing tests like this:

               BEGIN EXECUTION
               /home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/coreoverlay/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true keepalivedirectedgraph.dll ''
               Test should pass with ExitCode 100
               Building Graph with 100 vertices...
               Building Vertices...
               Building Edges...
               Making all vertices reachable...
               Deleting all vertices...
               ./keepalivedirectedgraph.sh: line 448: 169376 Segmentation fault      (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
               Expected: 100
               Actual: 139
               END EXECUTION - FAILED
         - GC/Regressions/v2.0-beta2/462651/462651/462651.sh
         - GC/Regressions/v2.0-beta1/289745/289745/289745.sh
         - GC/Regressions/v2.0-beta2/426480/426480/426480.sh
         - GC/Regressions/v2.0-beta2/445488/445488/445488.sh
         - GC/Regressions/v2.0-beta2/452950/452950/452950.sh
         - GC/Scenarios/BaseFinal/basefinal/basefinal.sh
         - GC/Regressions/v2.0-beta2/471729/471729/471729.sh
         - GC/Regressions/v3.0/25252/25252/25252.sh
         - GC/Scenarios/BinTree/thdtree/thdtree.sh
         - GC/Scenarios/Boxing/arrcpy/arrcpy.sh
         - GC/Scenarios/Boxing/gcvariant2/gcvariant2.sh
FAILED   - GC/Features/SustainedLowLatency/scenario/scenario.sh

but if I run it manually in the same terminal window, it passes:

am11@k1:~/projects/runtime$ pushd artifacts/tests/coreclr/linux.riscv64.Release/GC/Features/KeepAlive/keepaliveother/keepalivedirectedgraph
~/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/GC/Features/KeepAlive/keepaliveother/keepalivedirectedgraph ~/projects/runtime
am11@k1:~/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/GC/Features/KeepAlive/keepaliveother/keepalivedirectedgraph$ ./keepalivedirectedgraph.sh -coreroot=/home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root
BEGIN EXECUTION
/home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true keepalivedirectedgraph.dll ''
Test should pass with ExitCode 100
Building Graph with 100 vertices...
Building Vertices...
Building Edges...
Making all vertices reachable...
Deleting all vertices...
Done...
Expected: 100
Actual: 100
END EXECUTION - PASSED

I have tried 5-6 similar sigsegv cases, and all of them pass when I run them manually. It's a testing machine so I am not running much extra stuff. Still it has that GNOME DE running with some services etc. I can switch to text-mode or console mode if that's what might be causing these..

There are other kind of failures as well, e.g.

               BEGIN EXECUTION
               /home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/coreoverlay/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true stackoverflowtester.dll ''
               Running stackoverflow test(smallframe main)
               "Stack overflow."
               "Repeated 349283 times:"
               "--------------------------------"
               "   at TestStackOverflow.Program.InfiniteRecursionC()"
               "   at TestStackOverflow.Program.InfiniteRecursionB()"
               "   at TestStackOverflow.Program.InfiniteRecursionA()"
               "--------------------------------"
               "   at TestStackOverflow.Program.Test(Boolean)"
               "   at TestStackOverflow.Program.Main(System.String[])"
               ""
               Running stackoverflow test(largeframe main)
               "Stack overflow."
               "Repeated 85 times:"
               "--------------------------------"
               "   at TestStackOverflow.Program.InfiniteRecursionA2()"
               "   at TestStackOverflow.Program.InfiniteRecursionC2()"
               "   at TestStackOverflow.Program.InfiniteRecursionB2()"
               "--------------------------------"
               "   at TestStackOverflow.Program.InfiniteRecursionA2()"
               "   at TestStackOverflow.Program.Test(Boolean)"
               "   at TestStackOverflow.Program.Main(System.String[])"
               ""
               Running stackoverflow test(smallframe secondary)
               "Stack overflow."
               "Repeated 349378 times:"
               "--------------------------------"
               "   at TestStackOverflow.Program.InfiniteRecursionC()"
               "   at TestStackOverflow.Program.InfiniteRecursionB()"
               "   at TestStackOverflow.Program.InfiniteRecursionA()"
               "--------------------------------"
               "   at TestStackOverflow.Program.Test(Boolean)"
               "   at TestStackOverflow.Program+<>c__DisplayClass7_0.<SecondaryThreadsTest>b__0()"
               "   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)"
               ""
               Running stackoverflow test(largeframe secondary)
               ""
               System.Exception: Exit code: 0x0000008B, expected 0x00000086
                  at TestStackOverflow.Program.TestStackOverflow(String testName, String testArgs, List`1& stderrLines) in /runtime/src/tests/baseservices/exceptions/stackoverflow/stackoverflowtester.cs:line 75
                  at TestStackOverflow.Program.TestStackOverflowLargeFrameSecondaryThread() in /runtime/src/tests/baseservices/exceptions/stackoverflow/stackoverflowtester.cs:line 193
                  at __GeneratedMainWrapper.Main() in /runtime/artifacts/tests/coreclr/obj/linux.riscv64.Release/Managed/baseservices/exceptions/stackoverflow/stackoverflowtester/XUnitWrapperGenerator/XUnitWrapperGenerator.XUnitWrapperGenerator/SimpleRunner.g.cs:line 10
               Expected: 100
               Actual: 101
               END EXECUTION - FAILED

This also passes when I run stackoverflowtester.sh manually.

I then passed --sequential to the bringup script to disable parallel execution and found reproducible failure:

am11@k1:~/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/GC/Coverage/LargeObjectAlloc$ lldb-18 -- /home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true LargeObjectAlloc.dll '1'
(lldb) target create "/home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root/corerun"
Current executable set to '/home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root/corerun' (riscv64).
(lldb) settings set -- target.run-args  "-p" "System.Reflection.Metadata.MetadataUpdater.IsSupported=false" "-p" "System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true" "LargeObjectAlloc.dll" "1"
(lldb) r
Process 218168 launched: '/home/am11/projects/runtime/artifacts/tests/coreclr/linux.riscv64.Release/Tests/Core_Root/corerun' (riscv64)
LargeObjectAlloc started with 1 threads. Control-C to exit
All threads started
0: Restarting run 0
Process 218168 stopped
* thread #9, name = '0', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: 0x0000003ff7c1b282 libc.so.6`__futex_abstimed_wait_common [inlined] __futex_abstimed_wait_common64(private=<unavailable>, cancel=<unavailable>, abstime=0x0000000000000000, op=<unavailable>, expected=<unavailable>, futex_word=0x0000002aaab40fe8) at futex-internal.c:57:12
(lldb) bt
* thread #9, name = '0', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x0000003ff7c1b282 libc.so.6`__futex_abstimed_wait_common [inlined] __futex_abstimed_wait_common64(private=<unavailable>, cancel=<unavailable>, abstime=0x0000000000000000, op=<unavailable>, expected=<unavailable>, futex_word=0x0000002aaab40fe8) at futex-internal.c:57:12
    frame #1: 0x0000003ff7c1b26a libc.so.6`__futex_abstimed_wait_common(futex_word=0x0000002aaab40fe8, expected=<unavailable>, clockid=<unavailable>, abstime=0x0000000000000000, private=<unavailable>, cancel=<unavailable>) at futex-internal.c:87:9
    frame #2: 0x0000003ff7c1d072 libc.so.6`___pthread_cond_wait [inlined] __pthread_cond_wait_common(abstime=0x0000000000000000, clockid=0, mutex=0x0000002aaab40ff0, cond=0x0000002aaab40fc0) at pthread_cond_wait.c:503:10
    frame #3: 0x0000003ff7c1d016 libc.so.6`___pthread_cond_wait(cond=0x0000002aaab40fc0, mutex=0x0000002aaab40ff0) at pthread_cond_wait.c:627:10
    frame #4: 0x0000003ff7aadd62 libcoreclr.so`GCEvent::Impl::Wait(this=0x0000002aaab40fc0, milliseconds=<unavailable>, alertable=<unavailable>) at events.cpp:149:22 [opt]
    frame #5: 0x0000003ff79531d6 libcoreclr.so`WKS::gc_heap::try_allocate_more_space(alloc_context*, unsigned long, unsigned int, int) [inlined] WKS::gc_heap::wait_for_gc_done(timeOut=-1) at gc.cpp:14729:49 [opt]
    frame #6: 0x0000003ff79531ac libcoreclr.so`WKS::gc_heap::try_allocate_more_space(acontext=<unavailable>, size=<unavailable>, flags=<unavailable>, gen_number=<unavailable>) at gc.cpp:18939:9 [opt]
    frame #7: 0x0000003ff79763f6 libcoreclr.so`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) [inlined] WKS::gc_heap::allocate_more_space(acontext=0x0000003f0e22d180, size=6400056, flags=32, alloc_generation_number=3) at gc.cpp:19530:18 [opt]
    frame #8: 0x0000003ff79763e8 libcoreclr.so`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) [inlined] WKS::gc_heap::allocate_uoh_object(jsize=6400024, flags=32, gen_number=<unavailable>, alloc_bytes=0x0000003f0e22e9d0) at gc.cpp:45049:11 [opt]
    frame #9: 0x0000003ff79763d4 libcoreclr.so`WKS::GCHeap::Alloc(this=<unavailable>, context=0x0000003f0e22e9b8, size=6400024, flags=32) at gc.cpp:49563:30 [opt]
    frame #10: 0x0000003ff7850f36 libcoreclr.so`Alloc(size=6400024, flags=GC_ALLOC_LARGE_OBJECT_HEAP) at gchelpers.cpp:227:48 [opt]
    frame #11: 0x0000003ff7850ce0 libcoreclr.so`AllocateSzArray(pArrayMT=0x0000003f798b8d38, cElements=1600000, flags=<unavailable>) at gchelpers.cpp:0 [opt]
    frame #12: 0x0000003ff786422e libcoreclr.so`JIT_NewArr1(arrayMT=0x0000003f798b8d38, size=<unavailable>) at jithelpers.cpp:1571:16 [opt]
    frame #13: 0x0000003f798dedb4

gbalykov · 2024-08-29T20:24:50Z

I get failing tests like this:

keepalivedirectedgraph doesn't seem to fail in any of our launches on main on VisionFive2 with Debian provided by StarFive, looks strange
stackoverflowtester should probably be fixed with Disabled stackoverflow tests with largeframe for RISCV64 #106383 by @SzpejnaDawid
LargeObjectAlloc and tests like it usually require a lot of memory, maybe even 16 Gb is not enough, yet I also don't see it in failures on 4Gb VisionFive2 in our test runs

Also, I've heard that there're some issues with at least Banana Pi Bpi-f3 (e.g. https://forum.banana-pi.org/t/banana-pi-f3-with-16-gb-ram-constantly-freezing-solved/18678, https://www.reddit.com/r/RISCV/comments/1en1eb3/banana_pi_f3_with_16_gb_ram_constantly_freezing/), maybe your issue is smth like it since SpacemiT K1 and M1 are very similar as I understand. So bianbu os update might help.

am11 · 2024-09-20T05:36:23Z

@jkoritzinsky, feel free to merge this. Our discussion is a bit off-topic. 😅

@gbalykov, I am not sure if that freezing issue is related since this system, with the whole DE, is running for weeks now. However, I will try running priority tests again once Bianbu 2.0 is out of RC series (https://bianbu-linux.spacemit.com/en/release_notes/bl-v2.0.y/). That one has kernel 6.6. They are also upstreaming their kernel patches these days https://google.com/search?q=spacemit+site:lore.kernel.org, which may take some time after which I'd be able to try it on other distros (3 M lines of patches in Bianbu v2, compared to mainline's 6.6, so I'm not going to try to apply it on 6.10 or 11 myself 🙈).

Use TestExclusionList in bringup_runtest.sh

1a944fb

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 9, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 9, 2024

am11 added area-Infrastructure-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 9, 2024

am11 requested review from jkotas and jkoritzinsky August 20, 2024 05:53

jkoritzinsky approved these changes Aug 20, 2024

View reviewed changes

Delete handling of nonexistent src/ subpath

cd0e2e8

jkoritzinsky merged commit f968980 into dotnet:main Sep 20, 2024
70 checks passed

am11 deleted the patch-10 branch September 20, 2024 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use TestExclusionList in bringup_runtest.sh #106201

Use TestExclusionList in bringup_runtest.sh #106201

am11 commented Aug 9, 2024

am11 commented Aug 9, 2024

gbalykov commented Aug 20, 2024

am11 commented Aug 20, 2024

gbalykov commented Aug 21, 2024

am11 commented Aug 21, 2024

am11 commented Aug 24, 2024 •

edited

Loading

gbalykov commented Aug 25, 2024

am11 commented Aug 25, 2024 •

edited

Loading

gbalykov commented Aug 29, 2024

am11 commented Sep 20, 2024

Use TestExclusionList in bringup_runtest.sh #106201

Use TestExclusionList in bringup_runtest.sh #106201

Conversation

am11 commented Aug 9, 2024

am11 commented Aug 9, 2024

gbalykov commented Aug 20, 2024

am11 commented Aug 20, 2024

gbalykov commented Aug 21, 2024

am11 commented Aug 21, 2024

am11 commented Aug 24, 2024 • edited Loading

gbalykov commented Aug 25, 2024

am11 commented Aug 25, 2024 • edited Loading

gbalykov commented Aug 29, 2024

am11 commented Sep 20, 2024

am11 commented Aug 24, 2024 •

edited

Loading

am11 commented Aug 25, 2024 •

edited

Loading