Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Produce less convoluted IR for boolean isinst checks #103391

Merged
merged 6 commits into from
Jun 18, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jun 13, 2024

Currently the IR for boolean isinst checks ends up being something like (typecheck(x) ? x : null) != null, which the JIT ends up having a hard time clean up early. With object stack allocation this pattern usually leads to unnecessary address exposure.

This adds a simple pattern match during import to produce less convoluted IR in the common cases where the isinst is just used as a boolean check.

Example from #36649:

[MethodImpl(MethodImplOptions.NoInlining)]
public static bool Is_Slow(object obj) =>
    obj is int;
 ; Method C:Is_Slow(System.Object):ubyte (FullOpts)
 G_M38459_IG01:  ;; offset=0x0000
 						;; size=0 bbWeight=1 PerfScore 0.00
 
 G_M38459_IG02:  ;; offset=0x0000
        test     rcx, rcx
        je       SHORT G_M38459_IG05
 						;; size=5 bbWeight=1 PerfScore 1.25
 
 G_M38459_IG03:  ;; offset=0x0005
        mov      rax, 0x7FFB89FA1C00      ; System.Int32
        cmp      qword ptr [rcx], rax
        jne      SHORT G_M38459_IG05
 						;; size=15 bbWeight=0.25 PerfScore 1.06
 
 G_M38459_IG04:  ;; offset=0x0014
+       mov      eax, 1
        jmp      SHORT G_M38459_IG06
-						;; size=2 bbWeight=0.12 PerfScore 0.25
+						;; size=7 bbWeight=0.12 PerfScore 0.28
 
-G_M38459_IG05:  ;; offset=0x0016
-       xor      rcx, rcx
+G_M38459_IG05:  ;; offset=0x001B
+       xor      eax, eax
 						;; size=2 bbWeight=0.25 PerfScore 0.06
 
-G_M38459_IG06:  ;; offset=0x0018
-       test     rcx, rcx
-       setne    al
-       movzx    rax, al
-						;; size=9 bbWeight=1 PerfScore 1.50
-
-G_M38459_IG07:  ;; offset=0x0021
+G_M38459_IG06:  ;; offset=0x001D
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code: 34
+; Total bytes of code: 30

Fix #36649

Currently the IR for boolean `isinst` checks ends up being something
like `(typecheck(x) ? x : null) != null`, which the JIT ends up having a
hard time clean up early. With object stack allocation this pattern
usually leads to unnecessary address exposure.

This adds a simple pattern match during import to produce less
convoluted IR in the common cases where the `isinst` is just used as a
boolean check.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 13, 2024
@EgorBo
Copy link
Member

EgorBo commented Jun 13, 2024

Should close #36649

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jun 13, 2024

Should close #36649

Looks like it, your Is_Slow example from there:

[MethodImpl(MethodImplOptions.NoInlining)]
public static bool Is_Slow(object obj) =>
    obj is int;

Base:

; Method C:Is_Slow(System.Object):ubyte (FullOpts)
G_M38459_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=1 PerfScore 0.00

G_M38459_IG02:  ;; offset=0x0000
       test     rcx, rcx
       je       SHORT G_M38459_IG05
						;; size=5 bbWeight=1 PerfScore 1.25

G_M38459_IG03:  ;; offset=0x0005
       mov      rax, 0x7FFB89FA1C00      ; System.Int32
       cmp      qword ptr [rcx], rax
       jne      SHORT G_M38459_IG05
						;; size=15 bbWeight=0.25 PerfScore 1.06

G_M38459_IG04:  ;; offset=0x0014
       jmp      SHORT G_M38459_IG06
						;; size=2 bbWeight=0.12 PerfScore 0.25

G_M38459_IG05:  ;; offset=0x0016
       xor      rcx, rcx
						;; size=2 bbWeight=0.25 PerfScore 0.06

G_M38459_IG06:  ;; offset=0x0018
       test     rcx, rcx
       setne    al
       movzx    rax, al
						;; size=9 bbWeight=1 PerfScore 1.50

G_M38459_IG07:  ;; offset=0x0021
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 34

Diff:

; Method C:Is_Slow(System.Object):ubyte (FullOpts)
G_M38459_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=1 PerfScore 0.00

G_M38459_IG02:  ;; offset=0x0000
       test     rcx, rcx
       je       SHORT G_M38459_IG05
						;; size=5 bbWeight=1 PerfScore 1.25

G_M38459_IG03:  ;; offset=0x0005
       mov      rax, 0x7FFB89FC1C00      ; System.Int32
       cmp      qword ptr [rcx], rax
       jne      SHORT G_M38459_IG05
						;; size=15 bbWeight=0.25 PerfScore 1.06

G_M38459_IG04:  ;; offset=0x0014
       mov      eax, 1
       jmp      SHORT G_M38459_IG06
						;; size=7 bbWeight=0.12 PerfScore 0.28

G_M38459_IG05:  ;; offset=0x001B
       xor      eax, eax
						;; size=2 bbWeight=0.25 PerfScore 0.06

G_M38459_IG06:  ;; offset=0x001D
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 30

@jakobbotsch
Copy link
Member Author

/azp run runtime

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jun 17, 2024

Overall the IR created by this PR is not really any less convoluted than the existing IR. Instead of (x != null ? x.mt == expectedMT ? x : null : null) != null, we create (x != null ? x.mt == expectedMT ? 1 : 0 : 0) != 0.
I initially tried to create something like (x != null ? x.mt == expectedMT : 0) != 0, but that resulted in quite mixed diffs.
Even so, it still seems to help in some of the reported cases, and it will help for #103361 as well when x is stack allocated.

Of course it would be better if we could create the more natural control flow, but that's hard given the representation through QMARKs (when we are doing late expansion we create the more natural flow directly, but IIRC @EgorBo left the early expansion here in place because it is still more beneficial).

@EgorBo
Copy link
Member

EgorBo commented Jun 17, 2024

but IIRC @EgorBo left the early expansion here in place because it is still more beneficial).

I initially hoped we'd enable JitOptRepeat and I could move the late-cast expansion to the JitOptRepeat loop and the 2nd iteration of it would clean most of the regressions I hit just fine 😞

@jakobbotsch jakobbotsch marked this pull request as ready for review June 17, 2024 13:49
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @EgorBo @AndyAyersMS

Diffs

if (*booleanCheck)
{
GenTreeOp* condMT = gtNewOperNode(GT_NE, TYP_INT, gtNewMethodTableLookup(op1Clone), op2);
GenTreeOp* condNull = gtNewOperNode(GT_EQ, TYP_INT, gtClone(op1), gtNewNull());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like these two branches have a lot of common code, e.g. condMT and condNull are the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deduplicated a bit of it

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor clean up comment, probably, not worth re-running CI for

@jakobbotsch jakobbotsch merged commit b2ccd98 into dotnet:main Jun 18, 2024
105 of 107 checks passed
@jakobbotsch jakobbotsch deleted the boolean-isinst branch June 18, 2024 11:31
@github-actions github-actions bot locked and limited conversation to collaborators Jul 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unoptimal codegen for "obj is T" with T being struct/sealed
2 participants