[RyuJIT] Recognize and optimize constant set membership tests #8418

mikedn · 2017-06-27T15:31:41Z

It would be useful if the JIT recognized tests like:

if ('0' <= c && c <= '9') - can be reduced to a single branch + compare + subtract
if (c == ' ' || c == '\t' || c == '\r' || c == '\n') - can be reduced to 2 branches + compare + bit test (BT instruction)

The second variant is sometimes implemented using a switch/jump table (in which case is much more simpler to recognize):

switch (c) {
case ' ':
case '\t':
case '\r':
case '\n':
    return true;
default:
    return false;
}

The reduction in number of conditional branches to 1 or 2 can make other optimizations more effective - if-conversion in particular.

Similar issue in the Roslyn repository: https://github.com/dotnet/roslyn/issues/20375

The first variant can be implemented by managed compilers but getting good code for the second is problematic if the target arch bitness is not known. Also, the BT instruction is not available in IL so it has to be simulated with shift/and/compare which the JIT has to recognize.

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium

The text was updated successfully, but these errors were encountered:

mikedn · 2017-06-27T15:39:18Z

An experimental implementation shows a ~2700 bytes improvement for the first variant and a ~170 bytes improvement for the second variant. A proper implementation may do better but I don't expect it to be significantly better.

Example codegen for the first variant:

; before
       cmp      eax, 16
       jl       SHORT G_M38823_IG03
       cmp      eax, 18
       jle      SHORT G_M38823_IG05
; after
       lea      edx, [rax-16]
       cmp      edx, 2
       jbe      SHORT G_M38823_IG05

Example codegen for the second variant:

; before
       cmp      ecx, 0x1F41
       je       G_M36321_IG13
       cmp      ecx, 0x1F42
       je       G_M36321_IG13
       cmp      ecx, 0x1F49
       je       G_M36321_IG13
; after
       mov      eax, ecx
       sub      eax, 0x1F41
       cmp      eax, 8
       ja       G_M36321_IG05
       mov      r9d, 259
       bt       r9, rcx
       jb       G_M36321_IG14

BruceForstall · 2017-06-27T17:21:13Z

@dotnet/jit-contrib

mikedn · 2017-06-27T19:31:17Z

Switch to BT conversion yields a ~3700 bytes improvement. Example codegen:

; before
       cmp      eax, 3
       ja       SHORT G_M9431_IG07
       mov      eax, eax
       lea      rdx, [reloc @RWD00]
       mov      edx, dword ptr [rdx+4*rax]
       lea      rcx, G_M9431_IG02
       add      rdx, rcx
       jmp      rdx
; after
       cmp      eax, 3
       ja       SHORT G_M9431_IG07
       mov      edx, 3
       bt       edx, eax
       jb       SHORT G_M9431_IG05

It might be better but Roslyn seems to change rather quickly from switch table to binary search and that is much more difficult to recognize.

This transform is the cheapest of all in terms of compilation throughput since it's just a matter of looking at the number of switch arms.

mikedn · 2019-08-08T17:07:30Z

FWIW I've updated my old experiment (mikedn/coreclr@991e2a0) and run the diffs again:

Total bytes of diff: -3905 (-0.01% of base)
    diff is an improvement.
Top file regressions by size (bytes):
          33 : System.Runtime.Numerics.dasm (0.05% of base)
           4 : System.IO.Compression.dasm (0.00% of base)
           4 : System.ObjectModel.dasm (0.02% of base)
           3 : System.IO.IsolatedStorage.dasm (0.02% of base)
           2 : System.Runtime.Serialization.Formatters.dasm (0.00% of base)
Top file improvements by size (bytes):
       -1437 : System.Private.CoreLib.dasm (-0.04% of base)
        -318 : System.Private.Xml.dasm (-0.01% of base)
        -248 : Newtonsoft.Json.dasm (-0.04% of base)
        -190 : System.Data.Common.dasm (-0.02% of base)
        -178 : System.Text.Encoding.CodePages.dasm (-0.25% of base)
66 total files with size differences (61 improved, 5 regressed), 63 unchanged.
Top method regressions by size (bytes):
          68 ( 1.84% of base) : System.Runtime.Numerics.dasm - Number:ParseNumber(byref,long,int,byref,ref,ref,bool):bool
          41 ( 4.20% of base) : System.Net.Primitives.dasm - IPv6AddressHelper:Parse(struct,struct,int,byref)
          41 ( 4.20% of base) : System.Private.Uri.dasm - IPv6AddressHelper:Parse(struct,struct,int,byref)
          19 (11.24% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Scanner:ScanMultilineTrivia():struct:this
          19 ( 2.18% of base) : System.Private.CoreLib.dasm - Convert:TryFromBase64Chars(struct,struct,byref):bool
Top method improvements by size (bytes):
        -113 (-1.66% of base) : System.Private.CoreLib.dasm - Number:NumberToStringFormat(byref,byref,struct,ref)
         -62 (-12.40% of base) : System.Private.CoreLib.dasm - DateTime:TryCreate(int,int,int,int,int,int,int,byref):bool
         -60 (-1.54% of base) : System.Private.Xml.dasm - XmlSchemaInference:InferSimpleType(ref,byref):int
         -59 (-4.20% of base) : System.Security.AccessControl.dasm - GenericAce:CreateFromBinaryForm(ref,int):ref
         -51 (-2.17% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendFormatHelper(ref,ref,struct):ref:this
Top method regressions by size (percentage):
           6 (15.38% of base) : Microsoft.CodeAnalysis.dasm - AssemblyIdentity:IsWhiteSpace(ushort):bool
           6 (15.38% of base) : System.Private.CoreLib.dasm - Convert:IsSpace(ushort):bool
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - XmlBaseWriter:IsWhitespace(ushort):bool:this
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - EncodingStreamWrapper:IsWhitespace(ubyte):bool
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - XmlJsonReader:IsWhitespace(ubyte):bool
Top method improvements by size (percentage):
         -49 (-53.85% of base) : System.Net.HttpListener.dasm - HttpListener:IsClientFault(int):bool
         -34 (-44.74% of base) : System.Data.Common.dasm - ExpressionNode:IsIntegerSql(int):bool
         -35 (-37.23% of base) : System.Private.Uri.dasm - Uri:IsBadFileSystemCharacter(ushort):bool:this
         -20 (-35.71% of base) : System.Data.Common.dasm - ExpressionNode:IsInteger(int):bool
         -20 (-35.71% of base) : System.Private.Xml.dasm - XmlEntity:IsValidChildType(int):bool:this
756 total methods with size differences (644 improved, 112 regressed), 145991 unchanged.

Most of it comes from recognizing if ('0' <= c && c <= '9'). Some 700 bytes of diff come from recognizing if (c == ' ' || c == '\t' || c == '\r' || c == '\n').

The experiment was mostly written to get an idea how many opportunities for such transforms exist. It has various problems:

bool b = ('0' <= c && c <= '9') isn't recognized because the second operand of && is just c <= '9' rather than c <= '9' ? true : false. The current implementation only looks at compares involved in flow control.
('0' <= c && c <= '9') only works with constant bounds. Something like (0 <= c && c < x) should also work.
(c == ' ' || c == '\t' || c == '\r' || c == '\n') generates its own range check (since the bitmask has to have at most 64 bits) while the code may already contain a suitable range check (e.g. (c < ' ' && (c == ' ' || c == '\t' || c == '\r' || c == '\n')))
(c == ' ' || c == '\t' || ...) also fails if the range exceeds 64 and the checks aren't suitably ordered.
It's not clear where's the best place to do these transforms. I've done this after lowering mostly because this experiment was done together with if-conversion. It also makes use of the BT instruction (which is available only in LIR) but that could be replaced by RSZ and AND in HIR.

@AndyAyersMS Any opinions about implementing such optimizations in the JIT?

AndyAyersMS · 2019-08-09T13:38:43Z

I think doing it relatively late makes sense.

Running it earlier might reduce time and space needed by some analyses, but it fires pretty rarely, and the transformation doesn't seem to unblock any higher-level opts.

cc @dotnet/jit-contrib

EgorBo · 2019-09-03T11:45:46Z

The experiment was mostly written to get an idea how many opportunities for such transforms exist.

btw, I wonder if it makes sense to add some popular libs and applications (trending C# repos, most popular nuget packages) to the jit-diffs. People who write code for BCL and corelib are usually familiar with jit internals, e.g. https://github.com/dotnet/corefx/blob/master/src/System.Numerics.Vectors/src/System/Numerics/Quaternion.cs#L171 - developer knew JIT wouldn't replace / 2 with * 0.5, etc.

BruceForstall · 2019-09-03T16:23:35Z

@EgorBo It's a good idea, and we have a long-term plan to do that, probably using PMI-via-SuperPMI.

MihaZupan · 2019-09-03T18:07:00Z

I went a bit crazy after reading through @EgorBo's blog post and wrote an analyzer-codefix for many common character check patterns. Do you think something like this would be a candidate for a Roslyn/JIT codegen optimization?

I went a bit crazy and did it for ranges up to 128 wide.

It recognizes all Func<char, bool> that match the following patterns:

return c == 'a' || c == 'b' ...
return (c >= 'A' && c <= Z') || (c >= 'a' && c <= 122) ..., also < and > and rotated

(Or any mix of the above)

"abcdef".IndexOf(c) >= 0 or > -1 or rotated ...

public static bool IsEmailUsernameSpecialChar(char c)
{
    return ".!#$%&'*+/=?^_`{|}~-+.~".IndexOf(c) >= 0;
}

And switch statements with the following patterns:

public static bool IsAsciiPunctuation(this char c)
{
    switch (c)
    {
        case '!':
        // ...
        case '~':
            return true;
    }
    return false;
}
public static bool IsAsciiPunctuation(this char c)
{
    switch (c)
    {
        case '!':
        // ...
        case '~':
            return true;

        default:
            return false;
    }
}

128 bit wide checks end up looking like this

public static bool IsEmailUsernameSpecialChar(char c)
{
    /* return ".!#$%&'*+/=?^_`{|}~-+.~".IndexOf(c) >= 0; */

    int testValue = c;
    if (testValue > 126)
        return false;
    return testValue < 64 ? ((-6917268469155102720L >> testValue) & 1) != 0 : ((8646911292067545088L >> (testValue - 64)) & 1) != 0;
}

More examples from Markdig's character helper class: gist

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

EgorBo mentioned this issue Nov 16, 2021

Performance Improvement: Value Range Test #61664

Closed

EgorBo mentioned this issue Sep 15, 2022

Removed branches in some UriHelper-methods by using a bitmask-approach #75705

Closed

EgorBo mentioned this issue Feb 6, 2023

RyuJIT does not optimize multiple integer equality comparisons #81668

Closed

JulieLeeMSFT mentioned this issue Jun 15, 2023

Optimize if range check to replace jumps to bit operation #87656

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RyuJIT] Recognize and optimize constant set membership tests #8418

[RyuJIT] Recognize and optimize constant set membership tests #8418

mikedn commented Jun 27, 2017

mikedn commented Jun 27, 2017

BruceForstall commented Jun 27, 2017

mikedn commented Jun 27, 2017

mikedn commented Aug 8, 2019

AndyAyersMS commented Aug 9, 2019

EgorBo commented Sep 3, 2019 •

edited

Loading

BruceForstall commented Sep 3, 2019

MihaZupan commented Sep 3, 2019 •

edited

Loading

[RyuJIT] Recognize and optimize constant set membership tests #8418

[RyuJIT] Recognize and optimize constant set membership tests #8418

Comments

mikedn commented Jun 27, 2017

mikedn commented Jun 27, 2017

BruceForstall commented Jun 27, 2017

mikedn commented Jun 27, 2017

mikedn commented Aug 8, 2019

AndyAyersMS commented Aug 9, 2019

EgorBo commented Sep 3, 2019 • edited Loading

BruceForstall commented Sep 3, 2019

MihaZupan commented Sep 3, 2019 • edited Loading

EgorBo commented Sep 3, 2019 •

edited

Loading

MihaZupan commented Sep 3, 2019 •

edited

Loading