Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64: Optimize a % b operation #34937

Closed
kunalspathak opened this issue Apr 14, 2020 · 10 comments
Closed

ARM64: Optimize a % b operation #34937

kunalspathak opened this issue Apr 14, 2020 · 10 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@kunalspathak
Copy link
Member

kunalspathak commented Apr 14, 2020

Optimize a % b operation for ARM64 for following scenarios:

a is unsigned int, b is power of 2.

static uint PerformMod_1(uint i) {
       return i % 8;
}

Today we generate something like this:

        53037C01          lsr     w1, w0, #3
        531D7021          lsl     w1, w1, #3
        4B010000          sub     w0, w0, w1

We can generate:

   and     w2, w1, 3

a is signed int, b is power of 2.

static int PerformMod_2(int i) {
    return i % 16;
}

Today we generate:

G_M58511_IG02:
        131F7C01          asr     w1, w0, #31
        12000C21          and     w1, w1, #15
        0B000021          add     w1, w1, w0
        13047C21          asr     w1, w1, #4
        531C6C21          lsl     w1, w1, #4
        4B010000          sub     w0, w0, w1

We can generate:

        negs   x1, x0
        and    x0, x0, #(n - 1)
        and    x1, x1, #(n - 1)
        csneg  x0, x0, x1, mi

a is an int, b is a variable.

static uint PerformMod_3(int i, int j) {
    return i % j;
}

Today we generate:

        1AC10802          sdiv    w2, w0, w1  # or udiv if a is unsigned
        1B017C41          mul     w1, w2, w1
        4B010000          sub     w0, w0, w1

We can generate using msub:

        sdiv    w3, w2, w1
        msub    w3, w3, w1, w2

Reference: https://godbolt.org/z/yxH8jZ
Reference: https://patchwork.kernel.org/patch/11126001/

category:cq
theme:optimization
skill-level:intermediate
cost:medium

@kunalspathak kunalspathak self-assigned this Apr 14, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Apr 14, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

@kunalspathak kunalspathak added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Apr 14, 2020
@kunalspathak
Copy link
Member Author

@BruceForstall

@EgorBo
Copy link
Member

EgorBo commented Apr 14, 2020

Jit currently always gives up on % on Arm and always converts it to a - (a / b) * b in the morph phase, thus existing %(GT_MOD, GT_UMOD) optimizations in both morph and lowering don't work for it (happen later).

@kunalspathak
Copy link
Member Author

Yes, I have already spoke to @sandreenko about it and reverted some of the changes he did in dotnet/coreclr#18206. With that it handles the 1st case (where a is unsigned and b is power of 2). I need to handle other 2 cases yet.

@BruceForstall BruceForstall added this to the Future milestone Apr 20, 2020
@BruceForstall BruceForstall added optimization and removed untriaged New issue has not been triaged by the area owner labels Apr 20, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall modified the milestones: Future, 6.0.0 Nov 25, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020
@kunalspathak
Copy link
Member Author

Definitely, this will not happen in .NET 6.0.

@kunalspathak kunalspathak modified the milestones: 6.0.0, Future Jun 4, 2021
@kunalspathak
Copy link
Member Author

@TIHan - for this issue, you can start with checking what it will take to combine mul/sub into msub. Related #64591

@tannergooding
Copy link
Member

Just noting that unlike #64591, this one is over integer types and should be safe. I'm not aware of any cases (off the top of my head) where the output could differ for a * b + c when only handling integers.

Its certainly possible, however, that there is some edge case I'm not remembering or with certain variants of the instruction (there are variants that multiply, widen or narrow, and then add/subtract for example and these may have some edge case behavior depending on how they are used and other surrounding optimizations).

@EgorBo
Copy link
Member

EgorBo commented Feb 1, 2022

We already fold a * b + c into madd on arm.

@TIHan
Copy link
Contributor

TIHan commented Apr 13, 2022

Closing as we have finished the optimizations listed here.

@TIHan TIHan closed this as completed Apr 13, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 14, 2022
@JulieLeeMSFT
Copy link
Member

#68885

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

7 participants