Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Net.Http.Functional - HTTP/3 timeouts (mostly on NativeAOT) #75493

Closed
karelz opened this issue Sep 12, 2022 · 13 comments
Closed

System.Net.Http.Functional - HTTP/3 timeouts (mostly on NativeAOT) #75493

karelz opened this issue Sep 12, 2022 · 13 comments
Assignees
Labels
area-System.Net.Http blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'
Milestone

Comments

@karelz
Copy link
Member

karelz commented Sep 12, 2022

Frequency in last 30 days via Runfo as of 9/12:

  • First occurrence on 9/1 in Rolling run 2661
  • As of 9/9 it started hitting significantly Rolling runs
  • 9/12 - 24x failures (many Rolling runs), i.e. 2x per day ... mostly arm64-NativeAOT with these exceptions:

Examples of failures

Console log

System.Net.Http.Functional.Tests.SocketsHttpHandler_RequestContentLengthMismatchTest_Http3.ContentLength_DoesNotMatchRequestContentLength_Throws(contentLength: 0, bytesSent: 1) [FAIL]
      Assert.Contains() Failure
      Not found: Content-Length
      In value:  The connection timed out from inactivity. (127.0.0.1:34178)
      Stack Trace:
        /_/src/libraries/System.Net.Http/tests/FunctionalTests/SocketsHttpHandlerTest.cs(3701,0): at System.Net.Http.Functional.Tests.SocketsHttpHandler_RequestContentLengthMismatchTest.<>c__DisplayClass1_0.<<ContentLength_DoesNotMatchRequestContentLength_Throws>b__0>d.MoveNext()
        --- End of stack trace from previous location ---
        /_/src/libraries/Common/tests/System/Threading/Tasks/TaskTimeoutExtensions.cs(128,0): at System.Threading.Tasks.TaskTimeoutExtensions.GetRealException(Task task)
        --- End of stack trace from previous location ---
        /_/src/libraries/Common/tests/System/Threading/Tasks/TaskTimeoutExtensions.cs(90,0): at System.Threading.Tasks.TaskTimeoutExtensions.WhenAllOrAnyFailed(Task[] tasks)
        /_/src/libraries/Common/tests/System/Net/Http/GenericLoopbackServer.cs(39,0): at System.Net.Test.Common.LoopbackServerFactory.<>c__DisplayClass5_0.<<CreateClientAndServerAsync>b__0>d.MoveNext()
        --- End of stack trace from previous location ---
        /_/src/libraries/Common/tests/System/Net/Http/Http3LoopbackServer.cs(111,0): at System.Net.Test.Common.Http3LoopbackServerFactory.CreateServerAsync(Func`3 funcAsync, Int32 millisecondsTimeout, GenericLoopbackOptions options)
        /_/src/libraries/System.Net.Http/tests/FunctionalTests/SocketsHttpHandlerTest.cs(3693,0): at System.Net.Http.Functional.Tests.SocketsHttpHandler_RequestContentLengthMismatchTest.ContentLength_DoesNotMatchRequestContentLength_Throws(Int32 contentLength, Int32 bytesSent)
        --- End of stack trace from previous location ---
      Output:
        Ignored exception:
        System.OperationCanceledException: The operation was canceled.
           at System.Net.Quic.ValueTaskSource.TryComplete(Exception exception) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ValueTaskSource.cs:line 119
           at System.Net.Quic.ValueTaskSource.TrySetException(Exception exception) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ValueTaskSource.cs:line 166
           at System.Net.Quic.ValueTaskSource.<>c.<TryInitialize>b__10_0(Object obj, CancellationToken cancellationToken) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ValueTaskSource.cs:line 70
           at System.Threading.CancellationTokenSource.Invoke(Delegate d, Object state, CancellationTokenSource source) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 874
           at System.Threading.CancellationTokenSource.CallbackNode.ExecuteCallback() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 1104
           at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 699
           at System.Threading.CancellationTokenSource.NotifyCancellation(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 613
           at System.Threading.CancellationTokenSource.TimerCallback(Object state) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 35
           at System.Threading.TimerQueueTimer.Fire(Boolean isThreadPool) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Timer.cs:line 666
           at System.Threading.TimerQueue.FireNextTimers() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Timer.cs:line 330
           at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
           at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77
           at System.Threading.Thread.StartCallback() in /_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs:line 105
        --- End of stack trace from previous location ---
           at System.Net.Quic.ValueTaskSource.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ValueTaskSource.cs:line 176
           at System.Net.Quic.QuicListener.PendingConnection.StartHandshake(QuicConnection connection, SslClientHelloInfo clientHello, Func`4 connectionOptionsCallback) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicListener.PendingConnection.cs:line 61
           at System.Net.Quic.QuicListener.PendingConnection.DisposeAsync() in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicListener.PendingConnection.cs:line 99
           at System.Net.Quic.QuicListener.AcceptConnectionAsync(CancellationToken cancellationToken) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicListener.cs:line 172
           at System.Net.Test.Common.Http3LoopbackServer.EstablishHttp3ConnectionAsync() in /_/src/libraries/Common/tests/System/Net/Http/Http3LoopbackServer.cs:line 71
           at System.Net.Test.Common.Http3LoopbackServer.EstablishGenericConnectionAsync() in /_/src/libraries/Common/tests/System/Net/Http/Http3LoopbackServer.cs:line 80
           at System.Net.Test.Common.Http3LoopbackServer.HandleRequestAsync(HttpStatusCode statusCode, IList`1 headers, String content) in /_/src/libraries/Common/tests/System/Net/Http/Http3LoopbackServer.cs:line 92
           at System.Net.Http.Functional.Tests.SocketsHttpHandler_RequestContentLengthMismatchTest.<>c__DisplayClass1_0.<<ContentLength_DoesNotMatchRequestContentLength_Throws>b__1>d.MoveNext() in /_/src/libraries/System.Net.Http/tests/FunctionalTests/SocketsHttpHandlerTest.cs:line 3712

Console log

[FAIL] System.Net.Http.Functional.Tests.SocketsHttpHandler_SocketsHttpHandler_SecurityTest_Http3.SslOptions_InvalidName_Throws
System.TimeoutException : The operation has timed out.
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x28
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x104
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task) + 0x5c
   at System.Net.Http.Functional.Tests.SocketsHttpHandler_SecurityTest.<SslOptions_InvalidName_Throws>d__2.MoveNext() + 0x1d4
...

Console log

[FAIL] System.Net.Http.Functional.Tests.SocketsHttpHandlerTest_Http2.Http2_StreamResetByServerBeforePrefix_RequestFailsWithGoawayProtocolError
Assert.Fail(): Couldn't find HttpProtocolException with matching error code in exception: System.Net.Http.HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake.
   at System.Exception.SetCurrentStackTrace() + 0x64
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.SetCurrentStackTrace(Exception) + 0x1c
   at System.Net.Http.HttpConnectionPool.ReturnHttp2Connection(Http2Connection, Boolean, HttpConnectionPool.HttpConnectionWaiter`1) + 0x5a8
   at System.Net.Http.HttpConnectionPool.<AddHttp2ConnectionAsync>d__78.MoveNext() + 0x8a0
   ...

Console log - Helix timeout

...
[SKIP] System.Net.Http.Functional.Tests.SyncHttpHandler_HttpClientHandler_Authentication_Test.Credentials_DomainJoinedServerUsesKerberos_UseIpAddressAndHostHeader_Success
[SKIP] System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientHandler_Authentication_Test.Credentials_DomainJoinedServerUsesKerberos_Success
['System.Net.Http.Functional.Tests' END OF WORK ITEM LOG: Command timed out, and was killed]

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@karelz karelz added area-System.Net.Http blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' labels Sep 12, 2022
@ghost
Copy link

ghost commented Sep 12, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Frequency in last 30 days via Runfo as of 9/12:
- First occurrence on 9/1 in Rolling run 2661
- 9/12 - 22x failures (many Rolling runs), i.e. 2x per day ... mostly arm64-NativeAOT with 4x exceptions:
- 4x x64-NativeAOT -- all reported as Helix timeout (no test timeout) -- see last log example below
- 11877
- 11508
- 11121
- 7366

Examples of failures

Console log

System.Net.Http.Functional.Tests.SocketsHttpHandler_RequestContentLengthMismatchTest_Http3.ContentLength_DoesNotMatchRequestContentLength_Throws(contentLength: 0, bytesSent: 1) [FAIL]
      Assert.Contains() Failure
      Not found: Content-Length
      In value:  The connection timed out from inactivity. (127.0.0.1:34178)
      Stack Trace:
        /_/src/libraries/System.Net.Http/tests/FunctionalTests/SocketsHttpHandlerTest.cs(3701,0): at System.Net.Http.Functional.Tests.SocketsHttpHandler_RequestContentLengthMismatchTest.<>c__DisplayClass1_0.<<ContentLength_DoesNotMatchRequestContentLength_Throws>b__0>d.MoveNext()
        --- End of stack trace from previous location ---
        /_/src/libraries/Common/tests/System/Threading/Tasks/TaskTimeoutExtensions.cs(128,0): at System.Threading.Tasks.TaskTimeoutExtensions.GetRealException(Task task)
        --- End of stack trace from previous location ---
        /_/src/libraries/Common/tests/System/Threading/Tasks/TaskTimeoutExtensions.cs(90,0): at System.Threading.Tasks.TaskTimeoutExtensions.WhenAllOrAnyFailed(Task[] tasks)
        /_/src/libraries/Common/tests/System/Net/Http/GenericLoopbackServer.cs(39,0): at System.Net.Test.Common.LoopbackServerFactory.<>c__DisplayClass5_0.<<CreateClientAndServerAsync>b__0>d.MoveNext()
        --- End of stack trace from previous location ---
        /_/src/libraries/Common/tests/System/Net/Http/Http3LoopbackServer.cs(111,0): at System.Net.Test.Common.Http3LoopbackServerFactory.CreateServerAsync(Func`3 funcAsync, Int32 millisecondsTimeout, GenericLoopbackOptions options)
        /_/src/libraries/System.Net.Http/tests/FunctionalTests/SocketsHttpHandlerTest.cs(3693,0): at System.Net.Http.Functional.Tests.SocketsHttpHandler_RequestContentLengthMismatchTest.ContentLength_DoesNotMatchRequestContentLength_Throws(Int32 contentLength, Int32 bytesSent)
        --- End of stack trace from previous location ---
      Output:
        Ignored exception:
        System.OperationCanceledException: The operation was canceled.
           at System.Net.Quic.ValueTaskSource.TryComplete(Exception exception) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ValueTaskSource.cs:line 119
           at System.Net.Quic.ValueTaskSource.TrySetException(Exception exception) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ValueTaskSource.cs:line 166
           at System.Net.Quic.ValueTaskSource.<>c.<TryInitialize>b__10_0(Object obj, CancellationToken cancellationToken) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ValueTaskSource.cs:line 70
           at System.Threading.CancellationTokenSource.Invoke(Delegate d, Object state, CancellationTokenSource source) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 874
           at System.Threading.CancellationTokenSource.CallbackNode.ExecuteCallback() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 1104
           at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 699
           at System.Threading.CancellationTokenSource.NotifyCancellation(Boolean throwOnFirstException) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 613
           at System.Threading.CancellationTokenSource.TimerCallback(Object state) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs:line 35
           at System.Threading.TimerQueueTimer.Fire(Boolean isThreadPool) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Timer.cs:line 666
           at System.Threading.TimerQueue.FireNextTimers() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Timer.cs:line 330
           at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
           at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77
           at System.Threading.Thread.StartCallback() in /_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs:line 105
        --- End of stack trace from previous location ---
           at System.Net.Quic.ValueTaskSource.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ValueTaskSource.cs:line 176
           at System.Net.Quic.QuicListener.PendingConnection.StartHandshake(QuicConnection connection, SslClientHelloInfo clientHello, Func`4 connectionOptionsCallback) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicListener.PendingConnection.cs:line 61
           at System.Net.Quic.QuicListener.PendingConnection.DisposeAsync() in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicListener.PendingConnection.cs:line 99
           at System.Net.Quic.QuicListener.AcceptConnectionAsync(CancellationToken cancellationToken) in /_/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicListener.cs:line 172
           at System.Net.Test.Common.Http3LoopbackServer.EstablishHttp3ConnectionAsync() in /_/src/libraries/Common/tests/System/Net/Http/Http3LoopbackServer.cs:line 71
           at System.Net.Test.Common.Http3LoopbackServer.EstablishGenericConnectionAsync() in /_/src/libraries/Common/tests/System/Net/Http/Http3LoopbackServer.cs:line 80
           at System.Net.Test.Common.Http3LoopbackServer.HandleRequestAsync(HttpStatusCode statusCode, IList`1 headers, String content) in /_/src/libraries/Common/tests/System/Net/Http/Http3LoopbackServer.cs:line 92
           at System.Net.Http.Functional.Tests.SocketsHttpHandler_RequestContentLengthMismatchTest.<>c__DisplayClass1_0.<<ContentLength_DoesNotMatchRequestContentLength_Throws>b__1>d.MoveNext() in /_/src/libraries/System.Net.Http/tests/FunctionalTests/SocketsHttpHandlerTest.cs:line 3712

Console log

[FAIL] System.Net.Http.Functional.Tests.SocketsHttpHandler_SocketsHttpHandler_SecurityTest_Http3.SslOptions_InvalidName_Throws
System.TimeoutException : The operation has timed out.
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x28
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x104
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task) + 0x5c
   at System.Net.Http.Functional.Tests.SocketsHttpHandler_SecurityTest.<SslOptions_InvalidName_Throws>d__2.MoveNext() + 0x1d4
...

Console log

[FAIL] System.Net.Http.Functional.Tests.SocketsHttpHandlerTest_Http2.Http2_StreamResetByServerBeforePrefix_RequestFailsWithGoawayProtocolError
Assert.Fail(): Couldn't find HttpProtocolException with matching error code in exception: System.Net.Http.HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake.
   at System.Exception.SetCurrentStackTrace() + 0x64
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.SetCurrentStackTrace(Exception) + 0x1c
   at System.Net.Http.HttpConnectionPool.ReturnHttp2Connection(Http2Connection, Boolean, HttpConnectionPool.HttpConnectionWaiter`1) + 0x5a8
   at System.Net.Http.HttpConnectionPool.<AddHttp2ConnectionAsync>d__78.MoveNext() + 0x8a0
   ...

Console log - Helix timeout

...
[SKIP] System.Net.Http.Functional.Tests.SyncHttpHandler_HttpClientHandler_Authentication_Test.Credentials_DomainJoinedServerUsesKerberos_UseIpAddressAndHostHeader_Success
[SKIP] System.Net.Http.Functional.Tests.SocketsHttpHandler_HttpClientHandler_Authentication_Test.Credentials_DomainJoinedServerUsesKerberos_Success
['System.Net.Http.Functional.Tests' END OF WORK ITEM LOG: Command timed out, and was killed]
Author: karelz
Assignees: -
Labels:

area-System.Net.Http, blocking-clean-ci

Milestone: -

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 12, 2022
@karelz
Copy link
Member Author

karelz commented Sep 12, 2022

@LakshanF @MichalStrehovsky looks specific to NativeAOT, can you please take a look?
cc @jkotas

@karelz
Copy link
Member Author

karelz commented Sep 12, 2022

There have been 2 similar failures on non-NativeAOT (Debian 10 and 11 arm32) - both in the same PR 11455 - PR #75380 (by @wfurt) -- any idea what that was @wfurt?

@jkotas
Copy link
Member

jkotas commented Sep 13, 2022

There have been 2 similar failures on non-NativeAOT (Debian 10 and 11 arm32)

#75471 is opened on that.

@jkotas
Copy link
Member

jkotas commented Sep 13, 2022

The failures listed in #75471 are exactly same, so this does not appear to be native aot specific. The frequency just comes and goes due to unrelated changes in timing.

All failing tests seem to be Http3 tests, that means Quic is suspect.

[FAIL] System.Net.Http.Functional.Tests.SocketsHttpHandler_RequestContentLengthMismatchTest_Http3.ContentLength_DoesNotMatchRequestContentLength_Throws(contentLength: 0, bytesSent: 1)
Assert.Contains() Failure
Not found: Content-Length
In value:  The connection timed out from inactivity. (127.0.0.1:59716)
[FAIL] System.Net.Http.Functional.Tests.SocketsHttpHandler_RequestContentLengthMismatchTest_Http3.ContentLength_DoesNotMatchRequestContentLength_Throws(contentLength: 0, bytesSent: 1)
System.TimeoutException : The operation has timed out.
[FAIL] System.Net.Http.Functional.Tests.SocketsHttpHandler_SocketsHttpHandler_SecurityTest_Http3.SslOptions_InvalidName_Throws
System.TimeoutException : The operation has timed out.

@karelz karelz changed the title System.Net.Http.Functional - NativeAOT timeouts System.Net.Http.Functional - HTTP/3 timeouts (mostly on NativeAOT) Sep 13, 2022
@jkotas
Copy link
Member

jkotas commented Sep 13, 2022

First occurrence on 9/1 in Rolling run 2661
As of 9/9 it started hitting significantly Rolling runs

This has correlation with #74749 and #75163. The first attempt was merged on 9/1 and reverted on 9/2, that aligns with the first occurrence. The second attempt was merged on 9/9, that aligns with when it started hitting significantly.

Take this with grain of salt. The quic unloading change might have just changed the timing such that the problem is hit more often.

@wfurt
Copy link
Member

wfurt commented Sep 13, 2022

note that we never run Quic tests on arm32 before. While the observed failure may be the same, root cause may be different. I think #75471 should be investigated separately -> and closed if we get successful run.

@wfurt
Copy link
Member

wfurt commented Sep 13, 2022

@karelz karelz added this to the 7.0.0 milestone Sep 13, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 13, 2022
@karelz
Copy link
Member Author

karelz commented Sep 13, 2022

While it does not happen in 7.0, it is blocking 7.0 fix for #74629 -- so moving it into 7.0 to make sure it is gone from main and we have green light to backport fix for #74629 into 7.0 branch.

@rzikm
Copy link
Member

rzikm commented Sep 14, 2022

This issue seems to reliably happen on arm32 platforms, the connection establishment attempt times out.

  Discovering: System.Net.Quic.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Quic.Functional.Tests (found 1 of 112 test case)
  Starting:    System.Net.Quic.Functional.Tests (parallel test collections = on, max threads = 4)
    System.Net.Quic.Tests.QuicListenerTests.Listener_Backlog_Success [STARTING]
    System.Net.Quic.Tests.QuicListenerTests.Listener_Backlog_Success [FAIL]
      System.TimeoutException : The operation has timed out.
      Stack Trace:
           at System.Net.Quic.Tests.QuicListenerTests.Listener_Backlog_Success()
        --- End of stack trace from previous location ---
      Output:
        10:17:28.9124202[Info] thisOrContextObject: MsQuicSafeHandle#45653674, memberName: .ctor, message: [ reg][0x000092B83C0] MsQuicSafeHandle created
        10:17:28.9352567[Info] thisOrContextObject: MsQuicContextSafeHandle#39785641, memberName: .ctor, message: [list][0x000F4603E68] MsQuicSafeHandle created
        10:17:28.9744693[Info] thisOrContextObject: QuicListener#45523402, memberName: ListenAsync, message: [list][0x000F4603E68] Listener listens on 127.0.0.1:48854
        10:17:28.9877161[Info] thisOrContextObject: MsQuicContextSafeHandle#35287174, memberName: .ctor, message: [conn][0x000F466A6A0] MsQuicSafeHandle created
        10:17:29.0080770[Info] thisOrContextObject: MsQuicSafeHandle#44419000, memberName: .ctor, message: [cnfg][0x000F4607C70] MsQuicSafeHandle created
        10:17:29.0139129[Info] thisOrContextObject: QuicListener#45523402, memberName: NativeCallback, message: [list][0x000F4603E68] Received event NEW_CONNECTION
        10:17:29.0209739[Info] thisOrContextObject: MsQuicContextSafeHandle#21647132, memberName: .ctor, message: [conn][0x000F466D150] MsQuicSafeHandle created
        10:17:29.0297235[Info] thisOrContextObject: MsQuicSafeHandle#64538993, memberName: .ctor, message: [cnfg][0x000092F30C8] MsQuicSafeHandle created
        10:17:29.1325032[Info] thisOrContextObject: QuicConnection#6451435, memberName: HandleConnectionEvent, message: [conn][0x000F466D150] Received event IDEAL_PROCESSOR_CHANGED
    System.Net.Quic.Tests.QuicListenerTests.Listener_Backlog_Success [FINISHED] Time: 7.3912626s

Further digging points at potential MsQuic issue on arm32, MsQuic sends only the first datagram and rest of the sends fail with EINVAL

MsQuic trace
robox@robox-desktop:~/wfurt/helix-payload$ grep "Sending batch" quic.log -C 5
[4][2aec.2b26][03:29:10.794300][conn][0xffffffffebf6ac48] Setting 4, delay=10000000 us
[4][2aec.2b26][03:29:10.794310][time][0xffffffffebf08ac0] Updating Connection 0xffffffffebf6ac48.
[4][2aec.2b26][03:29:10.794315][time][0xffffffffebf08ac0] Next Expiration = {18446744072623901497, 0xffffffffebf6ac48}.
[4][2aec.2b26][03:29:10.794321][conn][0xffffffffebf6ac48] OUT: BytesSent=1220 InFlight=1220 InFlightMax=6100 CWnd=12200 SSThresh=4294967295 ConnFC=0 ISB=131072 PostedBytes=0 SRtt=0
[4][2aec.2b26][03:29:10.794327][conn][0xffffffffebf6ac48] OUT: StreamFC=0 StreamSendWindow=0
[4][2aec.2b26][03:29:10.794333][conn][0xffffffffebf6ac48] Sending batch. 1 datagrams
[4][2aec.2b26][03:29:10.794339][data][0xffffffffe93046d0] Send 1220 bytes in 1 buffers (segment=0) Dst=127.0.0.1:43275, Src=127.0.0.1:55440
[4][2aec.2b26][03:29:10.794435][pack][1099511627777] Batch sent
[4][2aec.2b26][03:29:10.794445][conn][0xffffffffebf6ac48] Setting loss detection 0 timer for 998844 us. (ProbeCount=0)
[4][2aec.2b26][03:29:10.794452][conn][0xffffffffebf6ac48] Setting 2, delay=998844 us
[4][2aec.2b26][03:29:10.794459][time][0xffffffffebf08ac0] Updating Connection 0xffffffffebf6ac48.
--
[5][2aec.2b26][03:29:10.968124][pack][2199023255554] Encrypting
[5][2aec.2b26][03:29:10.968133][pack][2199023255554] Finalizing
[5][2aec.2b26][03:29:10.968141][conn][0xffffffffe9d01a20][TX][1] 3 (1081 bytes)
[5][2aec.2b26][03:29:10.968147][conn][0xffffffffe9d01a20] OUT: BytesSent=1220 InFlight=1220 InFlightMax=6100 CWnd=12200 SSThresh=4294967295 ConnFC=16777216 ISB=131072 PostedBytes=0 SRtt=0
[5][2aec.2b26][03:29:10.968152][conn][0xffffffffe9d01a20] OUT: StreamFC=0 StreamSendWindow=0
[5][2aec.2b26][03:29:10.968157][conn][0xffffffffe9d01a20] Sending batch. 1 datagrams
[5][2aec.2b26][03:29:10.968165][data][0xffffffffebf29380] Send 1220 bytes in 1 buffers (segment=0) Dst=127.0.0.1:55440, Src=127.0.0.1:43275
[5][2aec.2b26][03:29:10.968184][data][0xffffffffebf29380] ERROR, 22, sendmmsg failed.
[5][2aec.2b26][03:29:10.968189][data][0xffffffffebf29380] ERROR, 22, sendmmsg completion.
[5][2aec.2b26][03:29:10.968195][bind][0xffffffffebf1ad00] Send failed, 0x22
[5][2aec.2b26][03:29:10.968206][pack][2199023255553] Batch sent
--
[5][2aec.2b26][03:29:10.968254][pack][2199023255555] Encrypting
[5][2aec.2b26][03:29:10.968263][pack][2199023255555] Finalizing
[5][2aec.2b26][03:29:10.968270][conn][0xffffffffe9d01a20][TX][2] 3 (1120 bytes)
[5][2aec.2b26][03:29:10.968276][conn][0xffffffffe9d01a20] OUT: BytesSent=2340 InFlight=2340 InFlightMax=6100 CWnd=12200 SSThresh=4294967295 ConnFC=16777216 ISB=131072 PostedBytes=0 SRtt=0
[5][2aec.2b26][03:29:10.968282][conn][0xffffffffe9d01a20] OUT: StreamFC=0 StreamSendWindow=0
[5][2aec.2b26][03:29:10.968287][conn][0xffffffffe9d01a20] Sending batch. 2 datagrams
[5][2aec.2b26][03:29:10.968294][data][0xffffffffebf29380] Send 1120 bytes in 1 buffers (segment=0) Dst=127.0.0.1:55440, Src=127.0.0.1:43275
[5][2aec.2b26][03:29:10.968307][data][0xffffffffebf29380] ERROR, 22, sendmmsg failed.
[5][2aec.2b26][03:29:10.968312][data][0xffffffffebf29380] ERROR, 22, sendmmsg completion.
[5][2aec.2b26][03:29:10.968317][bind][0xffffffffebf1ad00] Send failed, 0x22
[5][2aec.2b26][03:29:10.968328][pack][2199023255554] Batch sent
--

@rzikm
Copy link
Member

rzikm commented Sep 14, 2022

Submitted fix for the above (arm32 non-NativeAOT timeouts) for MsQuic microsoft/msquic#3065. I will look at Arm64 failures next.

@rzikm
Copy link
Member

rzikm commented Sep 14, 2022

Not all timeouts occur in HTTP3 tests, e.g. https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-75381-merge-abd43458a4a042cf95/System.Net.Http.Functional.Tests/1/console.d67787ff.log?helixlogtype=result, the first failure occurred before we even got to send HTTP3 request. (it failed during HTTP2 request to get Alt-Svc header).

Another failure in HTTP2 only test https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-540330e989f54fc08b/System.Net.Http.Functional.Tests/1/console.95517a16.log?helixlogtype=result

So far not able to reproduce locally by running tests in tight loop :/

@rzikm
Copy link
Member

rzikm commented Sep 20, 2022

The last occurence of the issue on main on NativeAOT is 09-12 according to following Kusto query

let timeouts = (includePR : bool) {
cluster('engsrvprod.kusto.windows.net').database('engineeringdata').WorkItems
| where Status == "Timeout"
| where FriendlyName startswith "System.Net.Http.Functional"
| where Finished > now(-21d)
| distinct JobId, WorkItemId, Name, FriendlyName, ConsoleUri
| join kind=inner (cluster('engsrvprod.kusto.windows.net').database('engineeringdata').Jobs
    | where (Branch == 'refs/heads/main') or (Branch == 'refs/heads/master') or (includePR and (Source startswith "pr/"))
    | where Type startswith "test/functional/cli/"
        and not(Properties contains "runtime-staging")
    | summarize arg_max(Finished, Properties, Type, Branch, Source, Started, QueueName) by JobId
| project-rename JobType = Type) on JobId
| order by Finished desc
| where extract_json("$.['System.PhaseName']", Properties) contains "NativeAOT"
| extend DefinitionName = extractjson("$.['DefinitionName']", Properties)
| extend OS = replace_regex(extractjson("$.['operatingSystem']", Properties), @'\((.*)\).*|([^\(].*)', @'\1\2')
| extend TargetBranch = extractjson("$.['System.PullRequest.TargetBranch']", Properties)
//| project-keep Finished, FriendlyName, ConsoleUri, TargetBranch, OS, DefinitionName, Branch
};
timeouts(true);

There is one OSX timeout on 09-16, but that seems to unrelated (no other Native AOT timeouts in last 3 weeks).

This correlates with merging of #75441 on 09-13.

I think we can close this issue for now, and track the rest of the timeouts in #75611

@rzikm rzikm closed this as completed Sep 20, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'
Projects
None yet
Development

No branches or pull requests

4 participants