Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64: Prefer fmul and fadd/fsub to fmadd/fnmsub #64591

Closed
kunalspathak opened this issue Feb 1, 2022 · 5 comments
Closed

Arm64: Prefer fmul and fadd/fsub to fmadd/fnmsub #64591

kunalspathak opened this issue Feb 1, 2022 · 5 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Comments

@kunalspathak
Copy link
Member

public static double calculate1(double x, double y, double z) => x * y - z;

public static double calculate2(double x, double y, double z) => x * y + z;

Today we generate 2 instructions fmul and fadd and instead should generate fmadd. Likewise, for fmul, fsub.

G_M29028_IG02:              ;; offset=0008H
        1E610800          fmul    d0, d0, d1
        1E622800          fadd    d0, d0, d2

godbolt: https://godbolt.org/z/Trrjq9cGs

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 1, 2022
@ghost
Copy link

ghost commented Feb 1, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details
public static double calculate1(double x, double y, double z) => x * y - z;

public static double calculate2(double x, double y, double z) => x * y + z;

Today we generate 2 instructions fmul and fadd and instead should generate fmadd. Likewise, for fmul, fsub.

G_M29028_IG02:              ;; offset=0008H
        1E610800          fmul    d0, d0, d1
        1E622800          fadd    d0, d0, d2

godbolt: https://godbolt.org/z/Trrjq9cGs

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@tannergooding
Copy link
Member

@kunalspathak, this is not a safe/legal optimization to do by default and we should close the issue.

fmadd is a fused operation and will return a different result for many inputs, the most obvious of which is double.MaxValue * 2 - double.MaxValue which should return double.PositiveInfinity but when using fmadd (and variants) will instead return double.MaxValue.

@tannergooding
Copy link
Member

There have been many proposals over the years to add some fp:fast math equivalent to .NET (such as #12753) but this is a non-trivial task and we need to take into account the various types of fp:fast optimizations that are possible since they can significantly impact the correctness of certain scenarios (WPF depends on NaN being correctly handled, for example).

The most obvious way I've considered is via an attribute + flags that allow control over the floating-point behavior for a given method. This would allow it to enable or disable specific sets of fp:fast optimizations and to specify if it is fp:fast incompatible (the default, effectively today's behavior) or that it is fp:fast compatible (will use the relevant fp:fast optimizations). One might also consider a "context dependent" variant where it only uses fp:fast logic if called from fp:fast compatible method, the most obvious usage for these is in cases like Math.* APIs where WPF would never want fp:fast behavior but certain fp:fast methods may.

All in all, it is not a trivial area and would need some decent design work and buy off before we could implement it. It would potentially be a good feature for dotnet/runtimelab

@tannergooding
Copy link
Member

For reference, https://godbolt.org/z/o54bYfPbP shows that Clang and MSVC only allow this when specifying ffast-math and fp:fast, respectively. After investigation GCC defaults to -ffp-contract=fast as per https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html.

Ultimately this comes down to wording in the C language spec that allows this behavior as it leaves contraction up to the implementation. GCC allows it by default while Clang/MSVC do not as it is not an IEEE 754 compliant optimization (it is considered a "value changing" optimization).

@kunalspathak
Copy link
Member Author

Thanks @tannergooding for the insights. I will close this issue for now.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 3, 2022
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Apr 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

4 participants