ARM64 - Optimizing a % b operations #65535

TIHan · 2022-02-18T00:22:54Z

Addressing part of this issue: #34937

Description

There are various ways to optimize % for integers on ARM64.

a % b can be transformed into a & (b - 1) if they are unsigned integers and b is a constant with the power of 2.

Acceptance Criteria

~~Add Tests~~ (asmdiffs cover this)

ghost · 2022-02-18T00:23:02Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Addressing this issue: #34937

Description

There are various ways to optimize % for integers on ARM64.

Example:
a % b can be transformed into a & (b - 1) if they are unsigned integers and b is a constant with the power of 2.

Acceptance Criteria

Add signed int mod optimization with known constant
Add signed int mod optimization without a known constant
Add Tests

Author:	TIHan
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

EgorBo · 2022-02-18T00:34:59Z

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead?

TIHan · 2022-02-18T00:42:37Z

@EgorBo are you referring to Lowering::LowerConstIntDivOrMod ?

EgorBo · 2022-02-18T00:45:53Z

LowerConstIntDivOrMod

yeah, and just move mod to a % b = a - (a / b) * b; there - thus, we won't have to re-implement the already existing optimization.

One potential problem with this approach that it might produce regressions where a - (a / b) * b previously led to more CSE opportunities, e.g. with a / b next to a % b

TIHan · 2022-02-18T00:49:14Z

It makes sense that we should do it there so the earlier phases don't screw it up.

TIHan · 2022-02-18T01:38:16Z

I did some work to see if I could move the existing mod optimizations to lowering, but it might be a bit much for what the PR is trying to accomplish.

tannergooding · 2022-02-18T20:51:41Z

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize around div/rem in many cases)?

src/coreclr/jit/gentree.h

kunalspathak · 2022-02-18T21:20:19Z

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize around div/rem in many cases)?

I agree.

EgorBo · 2022-02-18T23:24:45Z

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize around div/rem in many cases)?

cc @kunalspathak @tannergooding

I personally think it's not, for any non-leaf X in X % Y we have to introduce a new local (ASG node) instead of just keeping a simple x mod y expression, e.g (x + 1) % y early in morph is converted into:

[000015] -A-X-+--R---              \--*  SUB       int   
[000012] -----+------                 +--*  LCL_VAR   int    V03 tmp1         
[000014] -A-X-+------                 \--*  MUL       int   
[000004] -A-X-+------                    +--*  DIV       int   
[000011] -A---+------                    |  +--*  COMMA     int   
[000009] -A---+------                    |  |  +--*  ASG       int   
[000008] D----+-N----                    |  |  |  +--*  LCL_VAR   int    V03 tmp1         
[000002] -----+------                    |  |  |  \--*  ADD       int   
[000000] -----+------                    |  |  |     +--*  LCL_VAR   int    V00 arg0         
[000001] -----+------                    |  |  |     \--*  CNS_INT   int    1
[000010] -----+------                    |  |  \--*  LCL_VAR   int    V03 tmp1         
[000003] -----+------                    |  \--*  LCL_VAR   int    V01 arg1         
[000013] -----+------                    \--*  LCL_VAR   int    V01 arg1

instead of just:

[000004] ---X--------              \--*  MOD       int   
[000002] ------------                 +--*  ADD       int   
[000000] ------------                 |  +--*  LCL_VAR   int    V00 arg0         
[000001] ------------                 |  \--*  CNS_INT   int    1
[000003] ------------                 \--*  LCL_VAR   int    V01 arg1

E.g. it makes it non-hoistable for ARM64, e.g. see this:

loops are highlighted for both x64 and arm64

tannergooding · 2022-02-18T23:36:53Z

@EgorBo, I was referring specifically to the x % SomePow2 optimization being introduced here.

It should be a clear improvement to recognize and replace x % SomePow2 with x & (SomePow2 - 1) since that's the same number of nodes, still a constant, but also AND is better understood and optimized than DIV or MOD

EgorBo · 2022-02-18T23:43:56Z

@EgorBo, I was referring specifically to the x % SomePow2 optimization being introduced here.

It should be a clear improvement to recognize and replace x % SomePow2 with x & (SomePow2 - 1) since that's the same number of nodes, still a constant, but also AND is better understood and optimized than DIV or MOD

I'm fine with doing x umod POT early in morph - it makes sense and doesn't produce additional local, I was referring to my suggestion to remove the early expansion of general X [u]mod Y

TIHan · 2022-03-09T20:29:54Z

@kunalspathak @echesakovMSFT This is ready.

Will try to restart CI.

src/coreclr/jit/morph.cpp

kunalspathak

LGTM with nice diffs.

src/coreclr/jit/gentree.h

src/coreclr/jit/morph.cpp

src/coreclr/jit/gentree.h

src/coreclr/jit/morph.cpp

* Initial work for ARM64 mod optimization * Updated comment * Updated comment * Updated comment * Fixing build * Remove uneeded var * Use '%' morph logic for both x64/arm64 * Adding back in divisor check for x64 * Formatting * Update comments * Update comments * Fixing * Updated comment * Updated comment * Tweaking x64 transformation logic for the mod opt * Tweaking x64 transformation logic for the mod opt * Using IntCon * Fixed build * Minor tweak * Fixing x64 diffs * Removing flag set * Feedback * Fixing build * Feedback * Fixing tests * Fixing tests * Fixing tests * Formatting * Fixing tests * Feedback * Fixing build

TIHan added 3 commits February 17, 2022 13:12

Initial work for ARM64 mod optimization

64f7042

Updated comment

d1dce26

Updated comment

a3fbe54

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 18, 2022

ghost assigned TIHan Feb 18, 2022

Updated comment

729057d

TIHan added 2 commits February 17, 2022 17:40

Fixing build

c2eee76

Remove uneeded var

ca333e0

tannergooding reviewed Feb 18, 2022

View reviewed changes

src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved

TIHan added 3 commits February 23, 2022 15:02

Use '%' morph logic for both x64/arm64

7fc88ed

Merge remote-tracking branch 'upstream/main' into arm64-opt-mod

4720f55

Adding back in divisor check for x64

6656763

TIHan added 3 commits February 25, 2022 11:31

Formatting

cfa9805

Update comments

e058553

Update comments

b950ab4

TIHan added 6 commits March 2, 2022 13:20

Tweaking x64 transformation logic for the mod opt

dee80b5

Using IntCon

1fac071

Fixed build

372bcf3

Minor tweak

8809058

Fixing x64 diffs

b16d381

Removing flag set

9235f92

TIHan mentioned this pull request Mar 9, 2022

ARM64 - Optimizing a % b operations part 2 #66407

Merged

2 tasks

Merge remote-tracking branch 'upstream/main' into arm64-opt-mod

03b2cf0

EgorBo reviewed Mar 9, 2022

View reviewed changes

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved

EgorBo reviewed Mar 9, 2022

View reviewed changes

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved

kunalspathak approved these changes Mar 10, 2022

View reviewed changes

SingleAccretion reviewed Mar 10, 2022

View reviewed changes

TIHan added 3 commits March 10, 2022 10:42

Feedback

36b1e6a

Fixing build

964b426

Feedback

ec8246a

tannergooding approved these changes Mar 11, 2022

View reviewed changes

TIHan added 5 commits March 11, 2022 12:22

Fixing tests

27c3894

Fixing tests

4c36be7

Fixing tests

19f16b0

Formatting

f8921b9

Fixing tests

ade331d

echesakov reviewed Mar 12, 2022

View reviewed changes

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved

TIHan added 2 commits March 12, 2022 17:01

Feedback

06d1124

Fixing build

f142ca2

TIHan merged commit edf14c1 into dotnet:main Mar 14, 2022

TIHan deleted the arm64-opt-mod branch March 14, 2022 18:29

JulieLeeMSFT mentioned this pull request Apr 1, 2022

What's new in .NET 7 Preview 3 [WIP] dotnet/core#7108

Closed

ghost locked as resolved and limited conversation to collaborators Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64 - Optimizing a % b operations #65535

ARM64 - Optimizing a % b operations #65535

TIHan commented Feb 18, 2022 •

edited

Loading

ghost commented Feb 18, 2022

EgorBo commented Feb 18, 2022

TIHan commented Feb 18, 2022

EgorBo commented Feb 18, 2022

TIHan commented Feb 18, 2022

TIHan commented Feb 18, 2022

tannergooding commented Feb 18, 2022

kunalspathak commented Feb 18, 2022

EgorBo commented Feb 18, 2022

tannergooding commented Feb 18, 2022 •

edited

Loading

EgorBo commented Feb 18, 2022

TIHan commented Mar 9, 2022

kunalspathak left a comment

ARM64 - Optimizing a % b operations #65535

ARM64 - Optimizing a % b operations #65535

Conversation

TIHan commented Feb 18, 2022 • edited Loading

ghost commented Feb 18, 2022

EgorBo commented Feb 18, 2022

TIHan commented Feb 18, 2022

EgorBo commented Feb 18, 2022

TIHan commented Feb 18, 2022

TIHan commented Feb 18, 2022

tannergooding commented Feb 18, 2022

kunalspathak commented Feb 18, 2022

EgorBo commented Feb 18, 2022

tannergooding commented Feb 18, 2022 • edited Loading

EgorBo commented Feb 18, 2022

TIHan commented Mar 9, 2022

kunalspathak left a comment

Choose a reason for hiding this comment

TIHan commented Feb 18, 2022 •

edited

Loading

tannergooding commented Feb 18, 2022 •

edited

Loading