Arm64: Prefer fmul and fadd/fsub to fmadd/fnmsub #64591

kunalspathak · 2022-02-01T02:36:52Z

public static double calculate1(double x, double y, double z) => x * y - z;

public static double calculate2(double x, double y, double z) => x * y + z;

Today we generate 2 instructions fmul and fadd and instead should generate fmadd. Likewise, for fmul, fsub.

G_M29028_IG02:              ;; offset=0008H
        1E610800          fmul    d0, d0, d1
        1E622800          fadd    d0, d0, d2

godbolt: https://godbolt.org/z/Trrjq9cGs

The text was updated successfully, but these errors were encountered:

ghost · 2022-02-01T02:36:56Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

public static double calculate1(double x, double y, double z) => x * y - z;

public static double calculate2(double x, double y, double z) => x * y + z;

Today we generate 2 instructions fmul and fadd and instead should generate fmadd. Likewise, for fmul, fsub.

G_M29028_IG02:              ;; offset=0008H
        1E610800          fmul    d0, d0, d1
        1E622800          fadd    d0, d0, d2

godbolt: https://godbolt.org/z/Trrjq9cGs

Author:	kunalspathak
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

tannergooding · 2022-02-01T04:17:40Z

@kunalspathak, this is not a safe/legal optimization to do by default and we should close the issue.

fmadd is a fused operation and will return a different result for many inputs, the most obvious of which is double.MaxValue * 2 - double.MaxValue which should return double.PositiveInfinity but when using fmadd (and variants) will instead return double.MaxValue.

tannergooding · 2022-02-01T04:31:08Z

There have been many proposals over the years to add some fp:fast math equivalent to .NET (such as #12753) but this is a non-trivial task and we need to take into account the various types of fp:fast optimizations that are possible since they can significantly impact the correctness of certain scenarios (WPF depends on NaN being correctly handled, for example).

The most obvious way I've considered is via an attribute + flags that allow control over the floating-point behavior for a given method. This would allow it to enable or disable specific sets of fp:fast optimizations and to specify if it is fp:fast incompatible (the default, effectively today's behavior) or that it is fp:fast compatible (will use the relevant fp:fast optimizations). One might also consider a "context dependent" variant where it only uses fp:fast logic if called from fp:fast compatible method, the most obvious usage for these is in cases like Math.* APIs where WPF would never want fp:fast behavior but certain fp:fast methods may.

All in all, it is not a trivial area and would need some decent design work and buy off before we could implement it. It would potentially be a good feature for dotnet/runtimelab

tannergooding · 2022-02-01T07:50:49Z

For reference, https://godbolt.org/z/o54bYfPbP shows that Clang and MSVC only allow this when specifying ffast-math and fp:fast, respectively. After investigation GCC defaults to -ffp-contract=fast as per https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html.

Ultimately this comes down to wording in the C language spec that allows this behavior as it leaves contraction up to the implementation. GCC allows it by default while Clang/MSVC do not as it is not an IEEE 754 compliant optimization (it is considered a "value changing" optimization).

kunalspathak · 2022-02-01T16:29:12Z

Thanks @tannergooding for the insights. I will close this issue for now.

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 1, 2022

kunalspathak assigned TIHan Feb 1, 2022

kunalspathak mentioned this issue Feb 1, 2022

ARM64: Optimize a % b operation #34937

Closed

tannergooding mentioned this issue Feb 1, 2022

Ensure we set -ffp-contract=off on GCC #64604

Closed

kunalspathak closed this as completed Feb 1, 2022

ghost locked as resolved and limited conversation to collaborators Mar 3, 2022

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm64: Prefer fmul and fadd/fsub to fmadd/fnmsub #64591

Arm64: Prefer fmul and fadd/fsub to fmadd/fnmsub #64591

kunalspathak commented Feb 1, 2022

ghost commented Feb 1, 2022

tannergooding commented Feb 1, 2022

tannergooding commented Feb 1, 2022

tannergooding commented Feb 1, 2022

kunalspathak commented Feb 1, 2022

Arm64: Prefer fmul and fadd/fsub to fmadd/fnmsub #64591

Arm64: Prefer fmul and fadd/fsub to fmadd/fnmsub #64591

Comments

kunalspathak commented Feb 1, 2022

ghost commented Feb 1, 2022

tannergooding commented Feb 1, 2022

tannergooding commented Feb 1, 2022

tannergooding commented Feb 1, 2022

kunalspathak commented Feb 1, 2022