Skip to content

Limit the scope of forgotten reference checks #781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

jorisdral
Copy link
Collaborator

@jorisdral jorisdral commented Jul 8, 2025

Resolves #752

The RefNeverReleased exception appears relatively often nowadays while running tests in CI (see #752). I've been able to narrow down the cause of these exceptions to the prop_noSwallowedExceptions property test. Forgotten references in that property test cause RefNeverReleased exceptions in other tests, for example the unit_union_blobref_invalidation test that is mentioned in #752. The thing is that we expect to forget references in prop_noSwallowedExceptions because in that property test we are injecting disk faults in code that is not always exceptionsafe, which may lead to such forgotten references. We've gone to some lengths to ensure that prop_noSwallowedExceptions runs in isolation from other property tests with respect to forgotten references by turning off forgotten reference checks locally. As the test failures illustrate, it is tricky to do right, and the problem is persistent. So, we'll try to modify the forgotten refs checks approach a bit to get rid of the bug once and for all. First, some contents about the mechanism of checking forgotten refs:

  1. Forgotten references are checked globally for the whole program, in this case the test suite, using a global IORef (the "forgotten refs variable") that we create using unsafePerformIO and NOINLINE trickery.
  2. References are only marked as forgotten once the GC finaliser for the reference runs. The finaliser records the reference as forgotten in the "forgotten refs variable". So, whether a reference is marked as forgotten often depends on specific RTS/GC behaviour.
  3. Once it is recorded as forgotten, assertions that check the contents of the "forgotten refs variable" will throw the RefNeverReleased exceptions. These assertions run both as part of other reference operations (like dupRef), or they can be run manually. Depending on the RTS/GC scheduling, this exception can be raised at many points in the code even in other property tests than the one that the reference was created in, though each RefNeverReleased exception will be raised only precisely once.

We cant really do 2 differently because only the RTS/GC can detect whether a value becomes unused. We shouldn't change 3 because we should probably report forgotten references as promptly as possible. We can do 1 differently and provide some isolation of the "forgotten refs variable" that is more granular than per program.

A natural candidate for the scope in which references could be tracked is the Session context. References do not appear outside of sessions. They are stored in tables and cursors, but tables and cursors are always stored in a session context, and their lifetimes are also limited by the lifetime of their parent session. So, this PR introduces a new RefCtx that is created anew and stored in a session once a new session is created. This RefCtx holds a "forgotten refs variable", which would previously have been a global IORef. Now, the variable is scoped only to the lifetime of the RefCtx and therefore the lifetime of the parent Session. Once a session is closed, we "close" the RefCtx, which means that we check if any forgotten refs were recorded.

The addition of RefCtx causes some churn in the library, because values of the type have to be threaded from top-level code (that uses a Session) through to code that creates/destroys/modifies references. The upside is that the public API remains unchanged, only internal code changes. Moreover, forgotten reference checking is still compiled away in non-debug mode. The change also does not seem to impact performance, as demonstrated using some runs of the utxo-bench benchmark in non-debug mode:

  • Before the change, using disk cache: 103382.5 ops/sec
  • Before the change, using no disk cache: 65921.3 ops/sec
  • After the change, using disk cache: 104052.6 ops/sec
  • After the change, using no disk cache: 66062.3 ops/sec

The differences in measurement I would attribute to noise, seeing as the differences are so small. All other metrics, like allocated bytes, are also nearly identical.

Finally, we fix our issue with forgotten references from prop_noSwallowedExceptions causing RefNeverReleased exceptions in other tests. In prop_noSwallowedExceptions, we turn off forgotten reference checks by reaching into the Session to get out the RefCtx and disable its forgotten reference mechanism.

@jorisdral jorisdral force-pushed the jdral/refctx branch 2 times, most recently from d48fa4a to 30b4c36 Compare July 8, 2025 16:40
@jorisdral jorisdral changed the title Shrinkability of reference exceptions Limit the scope of forgotten reference checks Jul 9, 2025
jorisdral added 2 commits July 9, 2025 16:22
We do this to check for forgotten references in isolation from other parts of a
program. For example, if we have two separate properties that we want to run in
the same test suite, then we don't want a forgotten reference from one property
show up in the other property.
@jorisdral jorisdral marked this pull request as ready for review July 9, 2025 14:23
@jorisdral jorisdral self-assigned this Jul 9, 2025
@@ -352,22 +362,23 @@ mkWeakRefFromRaw obj = WeakRef obj
deRefWeak ::
(RefCounted m obj, PrimMonad m)
=> HasCallStackIfDebug
=> WeakRef obj
=> RefCtx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially thinking that this arg should not need to be here since we can (in debug mode) store the RefCtx in the WeakRef.

However, on reflection I think this is the right approach even though it is asymmetric with newRef. The choice is to supply the RefCtx when creating the WeakRef or when dereferencing it. We create WeakRefs more often than we dereference them, so supplying it at the latter point should be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Reference never released in unit_union_blobref_invalidation test
2 participants