Limit the scope of forgotten reference checks #781

jorisdral · 2025-07-08T10:50:19Z

Resolves #752

The RefNeverReleased exception appears relatively often nowadays while running tests in CI (see #752). I've been able to narrow down the cause of these exceptions to the prop_noSwallowedExceptions property test. Forgotten references in that property test cause RefNeverReleased exceptions in other tests, for example the unit_union_blobref_invalidation test that is mentioned in #752. The thing is that we expect to forget references in prop_noSwallowedExceptions because in that property test we are injecting disk faults in code that is not always exceptionsafe, which may lead to such forgotten references. We've gone to some lengths to ensure that prop_noSwallowedExceptions runs in isolation from other property tests with respect to forgotten references by turning off forgotten reference checks locally. As the test failures illustrate, it is tricky to do right, and the problem is persistent. So, we'll try to modify the forgotten refs checks approach a bit to get rid of the bug once and for all. First, some contents about the mechanism of checking forgotten refs:

Forgotten references are checked globally for the whole program, in this case the test suite, using a global IORef (the "forgotten refs variable") that we create using unsafePerformIO and NOINLINE trickery.
References are only marked as forgotten once the GC finaliser for the reference runs. The finaliser records the reference as forgotten in the "forgotten refs variable". So, whether a reference is marked as forgotten often depends on specific RTS/GC behaviour.
Once it is recorded as forgotten, assertions that check the contents of the "forgotten refs variable" will throw the RefNeverReleased exceptions. These assertions run both as part of other reference operations (like dupRef), or they can be run manually. Depending on the RTS/GC scheduling, this exception can be raised at many points in the code even in other property tests than the one that the reference was created in, though each RefNeverReleased exception will be raised only precisely once.

We cant really do 2 differently because only the RTS/GC can detect whether a value becomes unused. We shouldn't change 3 because we should probably report forgotten references as promptly as possible. We can do 1 differently and provide some isolation of the "forgotten refs variable" that is more granular than per program.

A natural candidate for the scope in which references could be tracked is the Session context. References do not appear outside of sessions. They are stored in tables and cursors, but tables and cursors are always stored in a session context, and their lifetimes are also limited by the lifetime of their parent session. So, this PR introduces a new RefCtx that is created anew and stored in a session once a new session is created. This RefCtx holds a "forgotten refs variable", which would previously have been a global IORef. Now, the variable is scoped only to the lifetime of the RefCtx and therefore the lifetime of the parent Session. Once a session is closed, we "close" the RefCtx, which means that we check if any forgotten refs were recorded.

The addition of RefCtx causes some churn in the library, because values of the type have to be threaded from top-level code (that uses a Session) through to code that creates/destroys/modifies references. The upside is that the public API remains unchanged, only internal code changes. Moreover, forgotten reference checking is still compiled away in non-debug mode. The change also does not seem to impact performance, as demonstrated using some runs of the utxo-bench benchmark in non-debug mode:

Before the change, using disk cache: 103382.5 ops/sec
Before the change, using no disk cache: 65921.3 ops/sec
After the change, using disk cache: 104052.6 ops/sec
After the change, using no disk cache: 66062.3 ops/sec

The differences in measurement I would attribute to noise, seeing as the differences are so small. All other metrics, like allocated bytes, are also nearly identical.

Finally, we fix our issue with forgotten references from prop_noSwallowedExceptions causing RefNeverReleased exceptions in other tests. In prop_noSwallowedExceptions, we turn off forgotten reference checks by reaching into the Session to get out the RefCtx and disable its forgotten reference mechanism.

We do this to check for forgotten references in isolation from other parts of a program. For example, if we have two separate properties that we want to run in the same test suite, then we don't want a forgotten reference from one property show up in the other property.

dcoutts · 2025-07-14T11:43:04Z

src-control/Control/RefCount.hs

@@ -352,22 +362,23 @@ mkWeakRefFromRaw obj = WeakRef obj
 deRefWeak ::
     (RefCounted m obj, PrimMonad m)
  => HasCallStackIfDebug
-  => WeakRef obj
+  => RefCtx


I was initially thinking that this arg should not need to be here since we can (in debug mode) store the RefCtx in the WeakRef.

However, on reflection I think this is the right approach even though it is asymmetric with newRef. The choice is to supply the RefCtx when creating the WeakRef or when dereferencing it. We create WeakRefs more often than we dereference them, so supplying it at the latter point should be better.

jorisdral force-pushed the jdral/refctx branch 2 times, most recently from d48fa4a to 30b4c36 Compare July 8, 2025 16:40

jorisdral changed the title ~~Shrinkability of reference exceptions~~ Limit the scope of forgotten reference checks Jul 9, 2025

jorisdral added 2 commits July 9, 2025 16:22

Turn off shrinking for prop_noSwallowedExceptions (for now)

d10d381

jorisdral force-pushed the jdral/refctx branch from b9f2079 to d10d381 Compare July 9, 2025 14:23

jorisdral marked this pull request as ready for review July 9, 2025 14:23

jorisdral requested review from dcoutts, mheinzel, recursion-ninja and wenkokke as code owners July 9, 2025 14:23

jorisdral mentioned this pull request Jul 9, 2025

Issue #752: try to fix forgotten reference #774

Closed

jorisdral self-assigned this Jul 9, 2025

dcoutts reviewed Jul 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Limit the scope of forgotten reference checks #781

Limit the scope of forgotten reference checks #781

Uh oh!

jorisdral commented Jul 8, 2025 •

edited

Loading

Uh oh!

dcoutts Jul 14, 2025

Uh oh!

Uh oh!

Limit the scope of forgotten reference checks #781

Are you sure you want to change the base?

Limit the scope of forgotten reference checks #781

Uh oh!

Conversation

jorisdral commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcoutts Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jorisdral commented Jul 8, 2025 •

edited

Loading