[wasm] Optimize Vector128<float>/<double>.Equals in interp/jiterp #88064

kg · 2023-06-26T20:39:38Z

Calling Equals on vectors of floating point types produces extremely nasty interp code that compiles down to a huge blob of wasm, which makes it hard to write tests/measurements that actually fit into traces. This PR adds intrinsics for doing equality comparisons on R4/R8 vectors and then implements them in jiterp. The performance also improves slightly over what it was before since the generated code is much improved.

The c interp implementation could probably be improved on, though - perhaps if clang has an all bits set intrinsic and an all_true intrinsic?

ghost · 2023-06-26T20:39:47Z

Tagging subscribers to this area: @BrzVlad, @kotlarmilos
See info in area-owners.md if you want to be subscribed.

Issue Details

Calling Equals on vectors of floating point types produces extremely nasty interp code that compiles down to a huge blob of wasm, which makes it hard to write tests/measurements that actually fit into traces. This PR adds intrinsics for doing equality comparisons on R4/R8 vectors and then implements them in jiterp. The performance also improves slightly over what it was before since the generated code is much improved.

The c interp implementation could probably be improved on, though - perhaps if clang has an all bits set intrinsic and an all_true intrinsic?

Author:	kg
Assignees:	-
Labels:	`area-Codegen-Interpreter-mono`
Milestone:	-

kg · 2023-07-11T03:26:00Z

@tannergooding any concerns about this change to Equals?

lewing · 2023-07-11T03:54:16Z

coreclr failures do not apply, @kg merge at will. @tannergooding we welcome feedback before or after merge

tannergooding · 2023-07-11T22:49:08Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128_1.cs

+        internal static bool EqualsFloatingPoint (Vector128<T> lhs, Vector128<T> rhs)
+        {
+            Vector128<T> result = Vector128.Equals(lhs, rhs) | ~(Vector128.Equals(lhs, lhs) | Vector128.Equals(rhs, rhs));
+            return result.AsInt32() == Vector128<int>.AllBitsSet;
+        }


Why not just handle Equals directly? There are several APIs on Vector2/3/4 and Vector<T> that are [Intrinsic] instance methods, so I would expect Mono has the support for handling that already?

Right now we already vectorize the underlying Equals operation, the problem is that this one desugars (?) to a bunch of individual SIMD operations that each get their own interp opcode.

That is, to be clear, if you call Vector128<float>.Equals(...) the interp generates something like this right now:

v128_equals_r4 (lhs, rhs) v128_equals_r4 (lhs, lhs) v128_equals_r4 (rhs, rhs) v128_or stack v128_not stack v128_or stack v128_load_allbitsset v128_equals_i4 stack v128_all_true

Sorry, I meant why not mark the instance Equals method as [Intrinsic] and have Mono treat that to be the same as operator == for integers and to be a single opcode representing this sequence for float/double

If that's what you prefer I can figure out how to do it. I wanted to keep this change as narrow as possible since the existing Equals method is fine for ints as-is. I'll see how much work it is.

That is, why split off a separate EqualsFloatingPoint at all, rather than simply handling its only caller directly?

No worries if its overly complex to do. I just figured it would be better overall for both RyuJIT and Mono.

We actually used to do just what I've proposed in RyuJIT, but dropped that support a while back since the inliner was able to handle it and the instance equals calls were much rarer to encounter.

Add interp intrinsics for Vector128 float and double Equals methods Implement Vector128 float and double Equals methods in jiterp

Add validation to make sure we never appendSimd(0) by accident

kg · 2023-07-12T00:03:11Z

Some of the PackedSimd changes on main broke this, so I had to update it. Good thing I ran my tests again :-)

kg added the area-Codegen-Interpreter-mono label Jun 26, 2023

kg requested review from lewing, pavelsavara, BrzVlad, vargaz and kotlarmilos as code owners June 26, 2023 20:39

ghost assigned kg Jun 26, 2023

build-analysis bot mentioned this pull request Jun 26, 2023

JIT/jit64/opt/rngchk/RngchkStress2.cs failing to build with error CS8078: An expression is too long or complex to compile #87879

Closed

lewing closed this Jul 11, 2023

lewing reopened this Jul 11, 2023

lewing approved these changes Jul 11, 2023

View reviewed changes

lewing requested a review from tannergooding July 11, 2023 03:51

This was referenced Jul 11, 2023

simpleruntimeeventvalidation test failing in CI #88499

Closed

Test failure readytorun/HardwareIntrinsics/X86/CpuId_R2R_Avx/CpuId_R2R_Avx.sh #88582

Closed

tannergooding reviewed Jul 11, 2023

View reviewed changes

tannergooding approved these changes Jul 11, 2023

View reviewed changes

kg added 3 commits July 11, 2023 16:22

Add browser-bench measurement for int32 and float equals

3ef08cf

Add interp intrinsics for Vector128 float and double Equals methods Implement Vector128 float and double Equals methods in jiterp

Fix simd opcode for the new r4/r8 equals intrinsics

e07798a

Add validation to make sure we never appendSimd(0) by accident

Fix explicit loads

4aaffd1

kg force-pushed the interp-vec-fp-equals branch from 7954f7d to 4aaffd1 Compare July 12, 2023 00:02

kg merged commit d59af2c into dotnet:main Jul 12, 2023
169 of 173 checks passed

build-analysis bot mentioned this pull request Jul 12, 2023

Timeout in Microsoft.Gen.OptionsValidation.Unit.Test.EmitterTest #88784

Closed

This was referenced Jul 19, 2023

[Perf] Linux/x64: 14 Regressions on 7/12/2023 7:15:00 AM dotnet/perf-autofiling-issues#19919

Closed

[Perf] Linux/x64: 1 Improvement on 7/12/2023 7:15:00 AM dotnet/perf-autofiling-issues#19925

Closed

kotlarmilos mentioned this pull request Aug 11, 2023

.NET 8 Per-Preview Performance report on WASM, Mono AOT, and Interpreter #84302

Closed

ghost locked as resolved and limited conversation to collaborators Aug 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wasm] Optimize Vector128<float>/<double>.Equals in interp/jiterp #88064

[wasm] Optimize Vector128<float>/<double>.Equals in interp/jiterp #88064

kg commented Jun 26, 2023

ghost commented Jun 26, 2023

kg commented Jul 11, 2023

lewing commented Jul 11, 2023

tannergooding Jul 11, 2023

kg Jul 11, 2023

kg Jul 11, 2023

tannergooding Jul 11, 2023 •

edited

Loading

kg Jul 11, 2023

tannergooding Jul 11, 2023

tannergooding Jul 11, 2023

kg commented Jul 12, 2023

[wasm] Optimize Vector128<float>/<double>.Equals in interp/jiterp #88064

[wasm] Optimize Vector128<float>/<double>.Equals in interp/jiterp #88064

Conversation

kg commented Jun 26, 2023

ghost commented Jun 26, 2023

kg commented Jul 11, 2023

lewing commented Jul 11, 2023

tannergooding Jul 11, 2023

Choose a reason for hiding this comment

kg Jul 11, 2023

Choose a reason for hiding this comment

kg Jul 11, 2023

Choose a reason for hiding this comment

tannergooding Jul 11, 2023 • edited Loading

Choose a reason for hiding this comment

kg Jul 11, 2023

Choose a reason for hiding this comment

tannergooding Jul 11, 2023

Choose a reason for hiding this comment

tannergooding Jul 11, 2023

Choose a reason for hiding this comment

kg commented Jul 12, 2023

tannergooding Jul 11, 2023 •

edited

Loading