RyuJIT does not optimize multiple integer equality comparisons #81668

gdkchan · 2023-02-06T00:34:06Z

Description

Currently, RyuJIT does not optimize a sequence of integer equality comparisons all leading to the same path.
For example, let's say one wants to implement a function that determines if a given char is a vowel, and return bool.
One could naively implement it like so:

    static bool IsVowel(char c)
    {
        return c == 'a' ||
            c == 'e' ||
            c == 'i' ||
            c == 'o' ||
            c == 'u' ||
            c == 'A' ||
            c == 'E' ||
            c == 'I' ||
            c == 'O' ||
            c == 'U';
    }

Today, that will be compiled to a long sequence of instructions, doing each comparison and jumping if equal.

       push     rbp
       mov      rbp, rsp
       movzx    rax, di
       cmp      eax, 97
       je       SHORT G_M5371_IG05
       cmp      eax, 101
       je       SHORT G_M5371_IG05
       cmp      eax, 105
       je       SHORT G_M5371_IG05
       cmp      eax, 111
       je       SHORT G_M5371_IG05
       cmp      eax, 117
       je       SHORT G_M5371_IG05
       cmp      eax, 65
       je       SHORT G_M5371_IG05
       cmp      eax, 69
       je       SHORT G_M5371_IG05
       cmp      eax, 73
       je       SHORT G_M5371_IG05
       cmp      eax, 79
       je       SHORT G_M5371_IG05
       cmp      eax, 85
       sete     al
       movzx    rax, al
       pop      rbp
       ret      
G_M5371_IG05:
       mov      eax, 1
       pop      rbp
       ret

However there is an optimization that native compilers will usually do here. Because the values being compared here are not too far apart, it's possible to construct a bit mask where the bit corresponding to the number of each vowel character is set to 1. Then one can just subtract the value being checked (c) by the lowest possible value (in this case A which is the vowel which the lowest value). If the result is negative, we already know it's out of range, and we can return false. If it's positive, and lower or equal to the highest value (in this case u), we can then check if the respective bit is set on the mask. If it's set we return true, otherwise false.

This is the code that GCC produces using the technique above for similar code written in C, with O1 optimization level:

        lea     ecx, [rdi-65]
        cmp     cl, 52
        ja      .L3
        movabs  rax, 4575140898685201
        shr     rax, cl
        and     eax, 1
        ret
.L3:
        mov     eax, 0
        ret

It's worth noting that it produces slightly different code at O2 (using bt instructions instead of the shr and and).
Other compilers like clang and MSVC produces similar code with optimizations enabled.

Comparison using Godbolt compiler explorer: https://godbolt.org/z/1efMGq3rx
RyuJIT codegen on Godbolt compiler explorer: https://godbolt.org/z/eqxnPaKGr
Above I also included a IsVowelOptimized function where I manually optimized it using the technique above, which produces more or less the ideal code, which I would expect to be produced for the IsVowel function.

This pattern is pretty common, so it would be nice to optimize it. For example, when working with textures, I often need to write code like:

if (format == TextureFormat.D24UnormS8Uint ||
    format == TextureFormat.D32FloatS8Uint ||
    format == TextureFormat.D24UnormX8 ||
    format == TextureFormat.D16Unorm ||
    format == TextureFormat.D32Float)
{
    // Do something with the depth texture...
}
else
{
    // Do something with the color texture...
}

Configuration

.NET 7.0.100, AFAIK it affects all platforms.

Regression?

No, to my knowledge, this optimization was never implemented on .NET JIT compiler.

The text was updated successfully, but these errors were encountered:

ghost · 2023-02-06T00:34:16Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Currently, RyuJIT does not optimize a sequence of integer equality comparisons all leading to the same path.
For example, let's say one wants to implement a function that determines if a given char is a vowel, and return bool.
One could naively implement it like so:

    static bool IsVowel(char c)
    {
        return c == 'a' ||
            c == 'e' ||
            c == 'i' ||
            c == 'o' ||
            c == 'u' ||
            c == 'A' ||
            c == 'E' ||
            c == 'I' ||
            c == 'O' ||
            c == 'U';
    }

Today, that will be compiled to a long sequence of instructions, doing each comparison and jumping if equal.

       push     rbp
       mov      rbp, rsp
       movzx    rax, di
       cmp      eax, 97
       je       SHORT G_M5371_IG05
       cmp      eax, 101
       je       SHORT G_M5371_IG05
       cmp      eax, 105
       je       SHORT G_M5371_IG05
       cmp      eax, 111
       je       SHORT G_M5371_IG05
       cmp      eax, 117
       je       SHORT G_M5371_IG05
       cmp      eax, 65
       je       SHORT G_M5371_IG05
       cmp      eax, 69
       je       SHORT G_M5371_IG05
       cmp      eax, 73
       je       SHORT G_M5371_IG05
       cmp      eax, 79
       je       SHORT G_M5371_IG05
       cmp      eax, 85
       sete     al
       movzx    rax, al
       pop      rbp
       ret      
G_M5371_IG05:
       mov      eax, 1
       pop      rbp
       ret

However there is an optimization that native compilers will usually do here. Because the values being compared here are not too far apart, it's possible to construct a bit mask where the bit corresponding to the number of each vowel character is set to 1. Then one can just subtract the value being checked (c) by the lowest possible value (in this case A which is the vowel which the lowest value). If the result is negative, we already know it's out of range, and we can return false. If it's positive, and lower or equal to the highest value (in this case u), we can then check if the respective bit is set on the mask. If it's set we return true, otherwise false.

This is the code that GCC produces using the technique above for similar code written in C, with O1 optimization level:

        lea     ecx, [rdi-65]
        cmp     cl, 52
        ja      .L3
        movabs  rax, 4575140898685201
        shr     rax, cl
        and     eax, 1
        ret
.L3:
        mov     eax, 0
        ret

It's worth noting that it produces slightly different code at O2 (using bt instructions instead of the shr and and).
Other compilers like clang and MSVC produces similar code with optimizations enabled.

Comparison using Godbolt compiler explorer: https://godbolt.org/z/1efMGq3rx
RyuJIT codegen on Godbolt compiler explorer: https://godbolt.org/z/eqxnPaKGr
Above I also included a IsVowelOptimized function where I manually optimized it using the technique above, which produces more or less the ideal code, which I would expect to be produced for the IsVowel function.

This pattern is pretty common, so it would be nice to optimize it. For example, when working with textures, I often need to write code like:

if (format == TextureFormat.D24UnormS8Uint ||
    format == TextureFormat.D32FloatS8Uint ||
    format == TextureFormat.D24UnormX8 ||
    format == TextureFormat.D16Unorm ||
    format == TextureFormat.D32Float)
{
    // Do something with the depth texture...
}
else
{
    // Do something with the color texture...
}

Configuration

.NET 7.0.100, AFAIK it affects all platforms.

Regression?

No, to my knowledge, this optimization was never implemented on .NET JIT compiler.

Author:	gdkchan
Assignees:	-
Labels:	`tenet-performance`, `area-CodeGen-coreclr`, `untriaged`
Milestone:	-

EgorBo · 2023-02-06T01:16:06Z

Duplicates #8418

NOTE: this is probably easier to implement in Roslyn/post-IL process to emit CEE_SWITCH opcode which is lowered into a bit-test or jump-table by JIT. But we can also do it in JIT at some point

gdkchan · 2023-02-06T01:40:07Z

Duplicates #8418

NOTE: this is probably easier to implement in Roslyn/post-IL process to emit CEE_SWITCH opcode which is lowered into a bit-test or jump-table by JIT. But we can also do it in JIT at some point

Sorry, I did not know how this optimization is called, so I didn't know what exactly to search for.
FWIW, I also tried implementing the function in the post using switch, and it also seems to generate several comparisons today.

    static bool IsVowelSwitch(char c)
    {
        switch (c)
        {
            case 'a':
            case 'e':
            case 'i':
            case 'o':
            case 'u':
            case 'A':
            case 'E':
            case 'I':
            case 'O':
            case 'U':
                return true;
        }

        return false;
    }

Looking at the IL, it seems that Roslyn is generating those comparisons, I wonder why.

EgorBo · 2023-02-06T01:44:02Z

Looking at the IL, it seems that Roslyn is generating those comparisons, I wonder why.

Roslyn is a bit conservative for switch opcode - if the input values have too big gaps (in value) - it gives up. E.g. this one works: https://sharplab.io/#v2:C4LglgNgPgAgTARgLACgYGYAE9MGFMDeqmJ2CAbJgEYD2NEmAkgM4BqNA7gKYQDKHYYAGMAFgApRAQwBOmIQEpipIilJrMzAcJGYJi1epIrDhoZOZdMAcklWQSk6TMXrVOw8dzzlq0PcHPLxcrABN/QJJnHy5wiKjrEVjA+KswJIiSGAB2TGBpAFcuAG4PNQBfVFLSbMwAM0kICxKAzAqUMqA===

EgorBo · 2023-02-06T09:36:36Z

Closed as a dup of #8418

gdkchan added the tenet-performance Performance related issue label Feb 6, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Feb 6, 2023

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 6, 2023

EgorBo closed this as not planned Won't fix, can't repro, duplicate, stale Feb 6, 2023

EgorBo removed the untriaged New issue has not been triaged by the area owner label Feb 6, 2023

ghost locked as resolved and limited conversation to collaborators Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJIT does not optimize multiple integer equality comparisons #81668

RyuJIT does not optimize multiple integer equality comparisons #81668

gdkchan commented Feb 6, 2023

ghost commented Feb 6, 2023

Description

Configuration

Regression?

EgorBo commented Feb 6, 2023

gdkchan commented Feb 6, 2023

EgorBo commented Feb 6, 2023

EgorBo commented Feb 6, 2023

RyuJIT does not optimize multiple integer equality comparisons #81668

RyuJIT does not optimize multiple integer equality comparisons #81668

Comments

gdkchan commented Feb 6, 2023

Description

Configuration

Regression?

ghost commented Feb 6, 2023

Description

Configuration

Regression?

EgorBo commented Feb 6, 2023

gdkchan commented Feb 6, 2023

EgorBo commented Feb 6, 2023

EgorBo commented Feb 6, 2023