Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 - Optimizing a % b operations #65535

Merged
merged 35 commits into from
Mar 14, 2022
Merged

ARM64 - Optimizing a % b operations #65535

merged 35 commits into from
Mar 14, 2022

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented Feb 18, 2022

Addressing part of this issue: #34937

Description

There are various ways to optimize % for integers on ARM64.

a % b can be transformed into a & (b - 1) if they are unsigned integers and b is a constant with the power of 2.

Acceptance Criteria

  • Add Tests (asmdiffs cover this)

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 18, 2022
@ghost ghost assigned TIHan Feb 18, 2022
@ghost
Copy link

ghost commented Feb 18, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Addressing this issue: #34937

Description

There are various ways to optimize % for integers on ARM64.

Example:
a % b can be transformed into a & (b - 1) if they are unsigned integers and b is a constant with the power of 2.

Acceptance Criteria

  • Add signed int mod optimization with known constant
  • Add signed int mod optimization without a known constant
  • Add Tests
Author: TIHan
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Feb 18, 2022

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead?

@TIHan
Copy link
Contributor Author

TIHan commented Feb 18, 2022

@EgorBo are you referring to Lowering::LowerConstIntDivOrMod ?

@EgorBo
Copy link
Member

EgorBo commented Feb 18, 2022

LowerConstIntDivOrMod

yeah, and just move mod to a % b = a - (a / b) * b; there - thus, we won't have to re-implement the already existing optimization.

One potential problem with this approach that it might produce regressions where a - (a / b) * b previously led to more CSE opportunities, e.g. with a / b next to a % b

@TIHan
Copy link
Contributor Author

TIHan commented Feb 18, 2022

It makes sense that we should do it there so the earlier phases don't screw it up.

@TIHan
Copy link
Contributor Author

TIHan commented Feb 18, 2022

I did some work to see if I could move the existing mod optimizations to lowering, but it might be a bit much for what the PR is trying to accomplish.

@tannergooding
Copy link
Member

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize around div/rem in many cases)?

@kunalspathak
Copy link
Member

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize around div/rem in many cases)?

I agree.

@EgorBo
Copy link
Member

EgorBo commented Feb 18, 2022

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize around div/rem in many cases)?

cc @kunalspathak @tannergooding

I personally think it's not, for any non-leaf X in X % Y we have to introduce a new local (ASG node) instead of just keeping a simple x mod y expression, e.g (x + 1) % y early in morph is converted into:

[000015] -A-X-+--R---              \--*  SUB       int   
[000012] -----+------                 +--*  LCL_VAR   int    V03 tmp1         
[000014] -A-X-+------                 \--*  MUL       int   
[000004] -A-X-+------                    +--*  DIV       int   
[000011] -A---+------                    |  +--*  COMMA     int   
[000009] -A---+------                    |  |  +--*  ASG       int   
[000008] D----+-N----                    |  |  |  +--*  LCL_VAR   int    V03 tmp1         
[000002] -----+------                    |  |  |  \--*  ADD       int   
[000000] -----+------                    |  |  |     +--*  LCL_VAR   int    V00 arg0         
[000001] -----+------                    |  |  |     \--*  CNS_INT   int    1
[000010] -----+------                    |  |  \--*  LCL_VAR   int    V03 tmp1         
[000003] -----+------                    |  \--*  LCL_VAR   int    V01 arg1         
[000013] -----+------                    \--*  LCL_VAR   int    V01 arg1         

instead of just:

[000004] ---X--------              \--*  MOD       int   
[000002] ------------                 +--*  ADD       int   
[000000] ------------                 |  +--*  LCL_VAR   int    V00 arg0         
[000001] ------------                 |  \--*  CNS_INT   int    1
[000003] ------------                 \--*  LCL_VAR   int    V01 arg1         

E.g. it makes it non-hoistable for ARM64, e.g. see this:
image
loops are highlighted for both x64 and arm64

@tannergooding
Copy link
Member

tannergooding commented Feb 18, 2022

@EgorBo, I was referring specifically to the x % SomePow2 optimization being introduced here.

It should be a clear improvement to recognize and replace x % SomePow2 with x & (SomePow2 - 1) since that's the same number of nodes, still a constant, but also AND is better understood and optimized than DIV or MOD

@EgorBo
Copy link
Member

EgorBo commented Feb 18, 2022

@EgorBo, I was referring specifically to the x % SomePow2 optimization being introduced here.

It should be a clear improvement to recognize and replace x % SomePow2 with x & (SomePow2 - 1) since that's the same number of nodes, still a constant, but also AND is better understood and optimized than DIV or MOD

I'm fine with doing x umod POT early in morph - it makes sense and doesn't produce additional local, I was referring to my suggestion to remove the early expansion of general X [u]mod Y

@TIHan
Copy link
Contributor Author

TIHan commented Mar 9, 2022

@kunalspathak @echesakovMSFT This is ready.

Will try to restart CI.

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nice diffs.

src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved
src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved
src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved
@TIHan TIHan merged commit edf14c1 into dotnet:main Mar 14, 2022
@TIHan TIHan deleted the arm64-opt-mod branch March 14, 2022 18:29
radekdoulik pushed a commit to radekdoulik/runtime that referenced this pull request Mar 30, 2022
* Initial work for ARM64 mod optimization

* Updated comment

* Updated comment

* Updated comment

* Fixing build

* Remove uneeded var

* Use '%' morph logic for both x64/arm64

* Adding back in divisor check for x64

* Formatting

* Update comments

* Update comments

* Fixing

* Updated comment

* Updated comment

* Tweaking x64 transformation logic for the mod opt

* Tweaking x64 transformation logic for the mod opt

* Using IntCon

* Fixed build

* Minor tweak

* Fixing x64 diffs

* Removing flag set

* Feedback

* Fixing build

* Feedback

* Fixing tests

* Fixing tests

* Fixing tests

* Formatting

* Fixing tests

* Feedback

* Fixing build
@ghost ghost locked as resolved and limited conversation to collaborators Apr 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants