Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Fix case where we illegally optimize away a memory barrier on ARM32 #91827

Merged
merged 1 commit into from
Sep 11, 2023

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Sep 8, 2023

The ARM32/ARM64 backends have an optimization where they optimize out the latter of two subsequent memory barriers if no memory store/load has been seen between them. This optimization did not take calls into account, allowing the removal of a memory barrier even if a call (with potential memory operations) had happened since the last one.

Fix #91732

The problem happened in ConcurrentQueueSegment<LargeStruct>.TryEnqueue in this code:

if (Interlocked.CompareExchange(ref _headAndTail.Tail, currentTail + 1, currentTail) == currentTail)
{
// Successfully reserved the slot. Note that after the above CompareExchange, other threads
// trying to return will end up spinning until we do the subsequent Write.
slots[slotsIndex].Item = item;
Volatile.Write(ref slots[slotsIndex].SequenceNumber, currentTail + 1);
return true;
}

The block copy was turned into a helper call to CORINFO_HELP_MEMCPY. Then the JIT effectively turned the Volatile.Write into a plain write without any memory barrier. The diff with this PR is:

@@ -1,42 +1,42 @@
 G_M50486_IG05:  ;; offset=0x0042
 ldr     r6, [r4+0x94]
 dmb     15
 ldr     r0, [r4+0x0C]
 and     r7, r0, r6
 ldr     r0, [r5+0x04]
 cmp     r7, r0
 bhs     SHORT G_M50486_IG07
 movs    r0, 72
 mul     r0, r7, r0
 adds    r0, 8
 ldr     r0, [r5+r0]
 dmb     15
 sub     r8, r0, r6
 cmp     r8, 0
 bne     SHORT G_M50486_IG06
 add     r0, r4, 148
 adds    r1, r6, 1
 mov     r2, r6
 movw    r3, 0x221
 movt    r3, 0xf732
 blx     r3		// System.Threading.Interlocked:CompareExchange(byref,int,int):int
 cmp     r0, r6
 bne     SHORT G_M50486_IG05
 movs    r2, 72
 mul     r2, r7, r2
 adds    r2, 8
 adds    r2, r5, r2
 add     r0, r2, 8
 add     r1, sp, 40
 movs    r2, 64
 movw    r12, 0x565f
 movt    r12, 0xf749
 blx     r12		// CORINFO_HELP_MEMCPY
 movs    r0, 72
 mul     r0, r7, r0
 adds    r0, 8
 adds    r3, r6, 1
+dmb     15
 str     r3, [r5+r0]
 movs    r0, 1
 b       SHORT G_M50486_IG03

The ARM32/ARM64 backends have an optimization where they optimize out
the latter of two subsequent memory barriers if no memory store/load has
been seen between them. This optimization should not be allowed to
remove memory barriers when a call has been seen.

Fix dotnet#91732
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 8, 2023
@ghost ghost assigned jakobbotsch Sep 8, 2023
@ghost
Copy link

ghost commented Sep 8, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

The ARM32/ARM64 backends have an optimization where they optimize out the latter of two subsequent memory barriers if no memory store/load has been seen between them. This optimization did not take calls into account, allowing the removal of a memory barrier even if a call (with potential memory operations) had happened since the last one.

Fix #91732

The problem happened in ConcurrentQueueSegment<LargeStruct>.TryEnqueue in this code:

if (Interlocked.CompareExchange(ref _headAndTail.Tail, currentTail + 1, currentTail) == currentTail)
{
// Successfully reserved the slot. Note that after the above CompareExchange, other threads
// trying to return will end up spinning until we do the subsequent Write.
slots[slotsIndex].Item = item;
Volatile.Write(ref slots[slotsIndex].SequenceNumber, currentTail + 1);
return true;
}

The JIT effectively turned the Volatile.Write into a plain write without any memory barrier. The diff with this PR is:

@@ -1,42 +1,42 @@
 G_M50486_IG05:  ;; offset=0x0042
 ldr     r6, [r4+0x94]
 dmb     15
 ldr     r0, [r4+0x0C]
 and     r7, r0, r6
 ldr     r0, [r5+0x04]
 cmp     r7, r0
 bhs     SHORT G_M50486_IG07
 movs    r0, 72
 mul     r0, r7, r0
 adds    r0, 8
 ldr     r0, [r5+r0]
 dmb     15
 sub     r8, r0, r6
 cmp     r8, 0
 bne     SHORT G_M50486_IG06
 add     r0, r4, 148
 adds    r1, r6, 1
 mov     r2, r6
 movw    r3, 0x221
 movt    r3, 0xf732
 blx     r3		// System.Threading.Interlocked:CompareExchange(byref,int,int):int
 cmp     r0, r6
 bne     SHORT G_M50486_IG05
 movs    r2, 72
 mul     r2, r7, r2
 adds    r2, 8
 adds    r2, r5, r2
 add     r0, r2, 8
 add     r1, sp, 40
 movs    r2, 64
 movw    r12, 0x565f
 movt    r12, 0xf749
 blx     r12		// CORINFO_HELP_MEMCPY
 movs    r0, 72
 mul     r0, r7, r0
 adds    r0, 8
 adds    r3, r6, 1
+dmb     15
 str     r3, [r5+r0]
 movs    r0, 1
 b       SHORT G_M50486_IG03
Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch jakobbotsch added this to the 8.0.0 milestone Sep 8, 2023
@@ -8908,6 +8908,7 @@ void emitter::emitIns_Call(EmitCallType callType,

dispIns(id);
appendToCurIG(id);
emitLastMemBarrier = nullptr; // Cannot optimize away future memory barriers
Copy link
Member Author

@jakobbotsch jakobbotsch Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this entire peephole should be switched to use the new backwards traversal peephole mechanism, which I expect would have quite significant TP improvements as we currently have code in very hot appendToCurIG to handle this peephole. I've opened #91825 to track that. However, given that this PR should be backported I've kept the fix here surgical.

@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib

Significant number of diffs on ARM32. No diffs on ARM64, presumably (as @EgorBo told me) because we use combined memory barrier and memory operations there now.

@jakobbotsch
Copy link
Member Author

Failure is #91838

@jakobbotsch jakobbotsch merged commit 6d5ea33 into dotnet:main Sep 11, 2023
124 of 127 checks passed
@jakobbotsch
Copy link
Member Author

/backport to release/8.0

@github-actions
Copy link
Contributor

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/6144973534

@jakobbotsch jakobbotsch deleted the fix-91732 branch September 11, 2023 10:16
@jakobbotsch jakobbotsch changed the title JIT: Fix case where we illegally optimize away a memory barrier on ARM32/ARM64 JIT: Fix case where we illegally optimize away a memory barrier on ARM32 Sep 11, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Oct 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
2 participants