Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RyuJIT] Recognize and optimize constant set membership tests #8418

Open
mikedn opened this issue Jun 27, 2017 · 8 comments
Open

[RyuJIT] Recognize and optimize constant set membership tests #8418

mikedn opened this issue Jun 27, 2017 · 8 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions JitUntriaged CLR JIT issues needing additional triage optimization tenet-performance Performance related issue
Milestone

Comments

@mikedn
Copy link
Contributor

mikedn commented Jun 27, 2017

It would be useful if the JIT recognized tests like:

  • if ('0' <= c && c <= '9') - can be reduced to a single branch + compare + subtract
  • if (c == ' ' || c == '\t' || c == '\r' || c == '\n') - can be reduced to 2 branches + compare + bit test (BT instruction)

The second variant is sometimes implemented using a switch/jump table (in which case is much more simpler to recognize):

switch (c) {
case ' ':
case '\t':
case '\r':
case '\n':
    return true;
default:
    return false;
}

The reduction in number of conditional branches to 1 or 2 can make other optimizations more effective - if-conversion in particular.

Similar issue in the Roslyn repository: https://github.com/dotnet/roslyn/issues/20375

The first variant can be implemented by managed compilers but getting good code for the second is problematic if the target arch bitness is not known. Also, the BT instruction is not available in IL so it has to be simulated with shift/and/compare which the JIT has to recognize.

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium

@mikedn
Copy link
Contributor Author

mikedn commented Jun 27, 2017

An experimental implementation shows a ~2700 bytes improvement for the first variant and a ~170 bytes improvement for the second variant. A proper implementation may do better but I don't expect it to be significantly better.

Example codegen for the first variant:

; before
       cmp      eax, 16
       jl       SHORT G_M38823_IG03
       cmp      eax, 18
       jle      SHORT G_M38823_IG05
; after
       lea      edx, [rax-16]
       cmp      edx, 2
       jbe      SHORT G_M38823_IG05

Example codegen for the second variant:

; before
       cmp      ecx, 0x1F41
       je       G_M36321_IG13
       cmp      ecx, 0x1F42
       je       G_M36321_IG13
       cmp      ecx, 0x1F49
       je       G_M36321_IG13
; after
       mov      eax, ecx
       sub      eax, 0x1F41
       cmp      eax, 8
       ja       G_M36321_IG05
       mov      r9d, 259
       bt       r9, rcx
       jb       G_M36321_IG14

@BruceForstall
Copy link
Member

@dotnet/jit-contrib

@mikedn
Copy link
Contributor Author

mikedn commented Jun 27, 2017

Switch to BT conversion yields a ~3700 bytes improvement. Example codegen:

; before
       cmp      eax, 3
       ja       SHORT G_M9431_IG07
       mov      eax, eax
       lea      rdx, [reloc @RWD00]
       mov      edx, dword ptr [rdx+4*rax]
       lea      rcx, G_M9431_IG02
       add      rdx, rcx
       jmp      rdx
; after
       cmp      eax, 3
       ja       SHORT G_M9431_IG07
       mov      edx, 3
       bt       edx, eax
       jb       SHORT G_M9431_IG05

It might be better but Roslyn seems to change rather quickly from switch table to binary search and that is much more difficult to recognize.

This transform is the cheapest of all in terms of compilation throughput since it's just a matter of looking at the number of switch arms.

@mikedn
Copy link
Contributor Author

mikedn commented Aug 8, 2019

FWIW I've updated my old experiment (mikedn/coreclr@991e2a0) and run the diffs again:

Total bytes of diff: -3905 (-0.01% of base)
    diff is an improvement.
Top file regressions by size (bytes):
          33 : System.Runtime.Numerics.dasm (0.05% of base)
           4 : System.IO.Compression.dasm (0.00% of base)
           4 : System.ObjectModel.dasm (0.02% of base)
           3 : System.IO.IsolatedStorage.dasm (0.02% of base)
           2 : System.Runtime.Serialization.Formatters.dasm (0.00% of base)
Top file improvements by size (bytes):
       -1437 : System.Private.CoreLib.dasm (-0.04% of base)
        -318 : System.Private.Xml.dasm (-0.01% of base)
        -248 : Newtonsoft.Json.dasm (-0.04% of base)
        -190 : System.Data.Common.dasm (-0.02% of base)
        -178 : System.Text.Encoding.CodePages.dasm (-0.25% of base)
66 total files with size differences (61 improved, 5 regressed), 63 unchanged.
Top method regressions by size (bytes):
          68 ( 1.84% of base) : System.Runtime.Numerics.dasm - Number:ParseNumber(byref,long,int,byref,ref,ref,bool):bool
          41 ( 4.20% of base) : System.Net.Primitives.dasm - IPv6AddressHelper:Parse(struct,struct,int,byref)
          41 ( 4.20% of base) : System.Private.Uri.dasm - IPv6AddressHelper:Parse(struct,struct,int,byref)
          19 (11.24% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Scanner:ScanMultilineTrivia():struct:this
          19 ( 2.18% of base) : System.Private.CoreLib.dasm - Convert:TryFromBase64Chars(struct,struct,byref):bool
Top method improvements by size (bytes):
        -113 (-1.66% of base) : System.Private.CoreLib.dasm - Number:NumberToStringFormat(byref,byref,struct,ref)
         -62 (-12.40% of base) : System.Private.CoreLib.dasm - DateTime:TryCreate(int,int,int,int,int,int,int,byref):bool
         -60 (-1.54% of base) : System.Private.Xml.dasm - XmlSchemaInference:InferSimpleType(ref,byref):int
         -59 (-4.20% of base) : System.Security.AccessControl.dasm - GenericAce:CreateFromBinaryForm(ref,int):ref
         -51 (-2.17% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendFormatHelper(ref,ref,struct):ref:this
Top method regressions by size (percentage):
           6 (15.38% of base) : Microsoft.CodeAnalysis.dasm - AssemblyIdentity:IsWhiteSpace(ushort):bool
           6 (15.38% of base) : System.Private.CoreLib.dasm - Convert:IsSpace(ushort):bool
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - XmlBaseWriter:IsWhitespace(ushort):bool:this
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - EncodingStreamWrapper:IsWhitespace(ubyte):bool
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - XmlJsonReader:IsWhitespace(ubyte):bool
Top method improvements by size (percentage):
         -49 (-53.85% of base) : System.Net.HttpListener.dasm - HttpListener:IsClientFault(int):bool
         -34 (-44.74% of base) : System.Data.Common.dasm - ExpressionNode:IsIntegerSql(int):bool
         -35 (-37.23% of base) : System.Private.Uri.dasm - Uri:IsBadFileSystemCharacter(ushort):bool:this
         -20 (-35.71% of base) : System.Data.Common.dasm - ExpressionNode:IsInteger(int):bool
         -20 (-35.71% of base) : System.Private.Xml.dasm - XmlEntity:IsValidChildType(int):bool:this
756 total methods with size differences (644 improved, 112 regressed), 145991 unchanged.

Most of it comes from recognizing if ('0' <= c && c <= '9'). Some 700 bytes of diff come from recognizing if (c == ' ' || c == '\t' || c == '\r' || c == '\n').

The experiment was mostly written to get an idea how many opportunities for such transforms exist. It has various problems:

  • bool b = ('0' <= c && c <= '9') isn't recognized because the second operand of && is just c <= '9' rather than c <= '9' ? true : false. The current implementation only looks at compares involved in flow control.
  • ('0' <= c && c <= '9') only works with constant bounds. Something like (0 <= c && c < x) should also work.
  • (c == ' ' || c == '\t' || c == '\r' || c == '\n') generates its own range check (since the bitmask has to have at most 64 bits) while the code may already contain a suitable range check (e.g. (c < ' ' && (c == ' ' || c == '\t' || c == '\r' || c == '\n')))
  • (c == ' ' || c == '\t' || ...) also fails if the range exceeds 64 and the checks aren't suitably ordered.
  • It's not clear where's the best place to do these transforms. I've done this after lowering mostly because this experiment was done together with if-conversion. It also makes use of the BT instruction (which is available only in LIR) but that could be replaced by RSZ and AND in HIR.

@AndyAyersMS Any opinions about implementing such optimizations in the JIT?

@AndyAyersMS
Copy link
Member

I think doing it relatively late makes sense.

Running it earlier might reduce time and space needed by some analyses, but it fires pretty rarely, and the transformation doesn't seem to unblock any higher-level opts.

cc @dotnet/jit-contrib

@EgorBo
Copy link
Member

EgorBo commented Sep 3, 2019

The experiment was mostly written to get an idea how many opportunities for such transforms exist.

btw, I wonder if it makes sense to add some popular libs and applications (trending C# repos, most popular nuget packages) to the jit-diffs. People who write code for BCL and corelib are usually familiar with jit internals, e.g. https://github.com/dotnet/corefx/blob/master/src/System.Numerics.Vectors/src/System/Numerics/Quaternion.cs#L171 - developer knew JIT wouldn't replace / 2 with * 0.5, etc.

@BruceForstall
Copy link
Member

@EgorBo It's a good idea, and we have a long-term plan to do that, probably using PMI-via-SuperPMI.

@MihaZupan
Copy link
Member

MihaZupan commented Sep 3, 2019

I went a bit crazy after reading through @EgorBo's blog post and wrote an analyzer-codefix for many common character check patterns. Do you think something like this would be a candidate for a Roslyn/JIT codegen optimization?

I went a bit crazy and did it for ranges up to 128 wide.

It recognizes all Func<char, bool> that match the following patterns:

  • return c == 'a' || c == 'b' ...
  • return (c >= 'A' && c <= Z') || (c >= 'a' && c <= 122) ..., also < and > and rotated

(Or any mix of the above)

  • "abcdef".IndexOf(c) >= 0 or > -1 or rotated ...
public static bool IsEmailUsernameSpecialChar(char c)
{
    return ".!#$%&'*+/=?^_`{|}~-+.~".IndexOf(c) >= 0;
}

And switch statements with the following patterns:

public static bool IsAsciiPunctuation(this char c)
{
    switch (c)
    {
        case '!':
        // ...
        case '~':
            return true;
    }
    return false;
}
public static bool IsAsciiPunctuation(this char c)
{
    switch (c)
    {
        case '!':
        // ...
        case '~':
            return true;

        default:
            return false;
    }
}

128 bit wide checks end up looking like this

public static bool IsEmailUsernameSpecialChar(char c)
{
    /* return ".!#$%&'*+/=?^_`{|}~-+.~".IndexOf(c) >= 0; */

    int testValue = c;
    if (testValue > 126)
        return false;
    return testValue < 64 ? ((-6917268469155102720L >> testValue) & 1) != 0 : ((8646911292067545088L >> (testValue - 64)) & 1) != 0;
}

More examples from Markdig's character helper class: gist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions JitUntriaged CLR JIT issues needing additional triage optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants