Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullReferenceException in Timer.Dispose() #4985

Closed
martincostello opened this issue Mar 6, 2024 · 6 comments
Closed

NullReferenceException in Timer.Dispose() #4985

martincostello opened this issue Mar 6, 2024 · 6 comments
Labels
area-resilience bug This issue describes a behavior which is not expected - a bug.

Comments

@martincostello
Copy link
Member

martincostello commented Mar 6, 2024

Description

In preparing a new release of Polly, our CI failed with the following exception in one of our unit tests. I've never seen this before, so I assume it's a rare race condition:

  Failed Polly.Core.Tests.Retry.RetryResilienceStrategyTests.MaxDelay_EnsureRespected [130 ms]
  Error Message:
   System.NullReferenceException : Object reference not set to an instance of an object.
  Stack Trace:
     at Microsoft.Extensions.Time.Testing.Timer.Dispose(Boolean _)
   at Microsoft.Extensions.Time.Testing.Timer.Dispose()
   at System.Threading.Tasks.Task.DelayPromise..ctor(UInt32 millisecondsDelay, TimeProvider timeProvider)
   at System.Threading.Tasks.Task.Delay(UInt32 millisecondsDelay, TimeProvider timeProvider, CancellationToken cancellationToken)
   at Polly.Utils.TimeProviderExtensions.DelayAsync(TimeProvider timeProvider, TimeSpan delay, ResilienceContext context) in /_/src/Polly.Core/Utils/TimeProviderExtensions.cs:line 49
   at Polly.Retry.RetryResilienceStrategy`1.ExecuteCore[TState](Func`3 callback, ResilienceContext context, TState state) in /_/src/Polly.Core/Retry/RetryResilienceStrategy.cs:line 96
   at Polly.Utils.Pipeline.BridgeComponent`1.<ConvertValueTask>g__ConvertValueTaskAsync|5_0[TTo](ValueTask`1 valueTask, ResilienceContext resilienceContext) in /_/src/Polly.Core/Utils/Pipeline/BridgeComponent.TResult.cs:line 51
   at Polly.ResiliencePipeline.ExecuteAsync[TResult](Func`2 callback, CancellationToken cancellationToken) in /_/src/Polly.Core/ResiliencePipeline.AsyncT.cs:line 170
   at Polly.Core.Tests.Retry.RetryResilienceStrategyTests.ExecuteAndAdvance(ResiliencePipeline`1 sut) in /_/test/Polly.Core.Tests/Retry/RetryResilienceStrategyTests.cs:line 400
   at Polly.Core.Tests.Retry.RetryResilienceStrategyTests.MaxDelay_EnsureRespected() in /_/test/Polly.Core.Tests/Retry/RetryResilienceStrategyTests.cs:line 263
   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_0(Object state)

Looks like the only place in the method that could cause this is this line:

A trivial fix would be to add a null guard, but the ! suggests that maybe the proper fix is some other change so that the assertion holds in all cases.

Reproduction Steps

Unsure - looks like a rare race condition in the internal state of the Timer class.

Expected behavior

No exception is thrown.

Actual behavior

NullReferenceException is thrown.

Regression?

I'm guessing it's always been this way.

Known Workarounds

None.

Configuration

  • .NET SDK 8.0.201
  • Microsoft.Extensions.TimeProvider.Testing 8.2.0

Other information

No response

@martincostello martincostello added bug This issue describes a behavior which is not expected - a bug. untriaged labels Mar 6, 2024
@klauco
Copy link

klauco commented Mar 7, 2024

@martintmk can you please take a look?

@geeknoid
Copy link
Member

geeknoid commented Mar 7, 2024

@klauco This is likely a bug in the FakeTimeProvider implementation, I don't think it's related to resilience logic.

@geeknoid
Copy link
Member

geeknoid commented Mar 7, 2024

@martincostello Is it possible your test could possibly be calling Dispose on the timer in two threads and they are racing with one another? Concurrent execution of the Dispose method is the only way I can explain a null reference exception being thrown.

@geeknoid
Copy link
Member

geeknoid commented Mar 7, 2024

Also, I think calling Dispose while another thread is currently executing in the ITimer.Change method could also lead to this exception.

@martincostello
Copy link
Member Author

The test is here: MaxDelay_EnsureRespected

It doesn't look like it's doing anything on multiple threads, but I did notice that the test is an async void which it shouldn't be.

I don't know why the analysers aren't erroring on that, but I'll open a PR to get rid of them.

Given how many dozens/hundreds/more(?) times that test will have run by now since it was added in September, even if that is the cause it's not likely to be immediately provable as the culprit.

@geeknoid
Copy link
Member

geeknoid commented Mar 7, 2024

I'm willing to bet the void return was the problem. I'm going to close the issue, please feel free to reopen if you see it happen again.

@geeknoid geeknoid closed this as completed Mar 7, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-resilience bug This issue describes a behavior which is not expected - a bug.
Projects
None yet
Development

No branches or pull requests

3 participants