Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison #120718

saethlin · 2024-02-06T19:32:17Z

Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the LangRef, only the flags nnan (no nans) and ninf (no infs) can produce poison.

And this uses the algebraic float ops to fix #120720

cc @orlp

rustbot · 2024-02-06T19:32:26Z

r? @cuviper

(rustbot has picked a reviewer for you, use r? to override)

orlp · 2024-02-07T00:20:31Z

I'd like to support this (I mean, I suggested it to @saethlin), it seems that with just the current safe LLVM flags available a lot of optimizations are already possible (and more algebraically justified optimizations could be added in the future). Most notably, autovectorization and FMA conversion both work out of the box. From a quick test of this PR,

fn sum(arr: &[f32]) -> f32 {
    arr.iter().fold(0.0, |a, b| fadd_algebraic(a, *b))
}

generated excellent autovectorized code, completely safely.

The current f*_fast intrinsics are essentially unusable in generic code because they are instant UB on infinities/NaNs. It is for any non-trivial algorithm completely infeasible to verify that no infinities/NaNs exist in the input, let alone are produced as (intermediate) results. In a quick search through Github I've found every place that used f*_fast to be unsound. The safe algebraically optimizeable float operations are desperately needed.

rustbot · 2024-02-07T02:09:07Z

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

compiler/rustc_codegen_ssa/src/mir/intrinsic.rs

rustbot · 2024-02-07T17:03:52Z

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

orlp · 2024-02-08T18:19:13Z

Zulip thread with more context: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/.22algebraic.22.20fast-math.20intrinsics

Interestingly, the IEEE standard does actually say that a language standard should have functionality like this:

It says this should be per 'block' rather than per operation, but that's just syntax.

cuviper · 2024-02-15T22:00:04Z

I think this needs more compiler review than libs...

r? compiler

bors · 2024-02-16T11:55:30Z

☔ The latest upstream changes (presumably #120500) made this pull request unmergeable. Please resolve the merge conflicts.

petrochenkov · 2024-02-17T07:56:36Z

I'm on vacation.
r? compiler

nnethercote · 2024-02-19T05:55:30Z

compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp

+    I->setHasAllowReassoc(true);
+    I->setHasAllowContract(true);
+    I->setHasAllowReciprocal(true);
+    I->setHasNoSignedZeros(true);


What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.

Does the word "algebraic" have a specific meaning here?

Never mind, I see from other places that it does.

What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.

Sure, why not. I'll add it.

Does the word "algebraic" have a specific meaning here?

I didn't want to just defang the existing intrinsics because there are some optimizations which rely on assuming that NaN/Inf do not occur. So I needed to come up with a new name, and the way I think about these intrinsics is that unlike IEEE float operations, they permit the usual algebraic transformations. Things like a + (b + b) = (a + b) + c and a / b = a * (1 / b).

I don't think that the name is perfect, and I'd be happy to see someone suggest a better name, but @orlp seems perfectly happy calling them "algebraic".

they permit the usual algebraic transformations. Things like a + (b + b) = (a + b) + c and a / b = a * (1 / b)

That would be a great explanation to put in a comment somewhere :)

What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.

Sure, why not. I'll add it.

Please don't. Replacing functions by their approximations isn't an algebraically justified optimization.

🤔 And in any case, I don't think it would matter for these intrinsics.

nnethercote · 2024-02-19T06:12:23Z

library/core/src/intrinsics.rs

@@ -1882,6 +1882,46 @@ extern "rust-intrinsic" {
    #[rustc_nounwind]
    pub fn frem_fast<T: Copy>(a: T, b: T) -> T;

+    /// Float addition that allows optimizations based on algebraic rules.
+    ///
+    /// This intrinsic does not have a stable counterpart.


Which meaning of "stable" does this use?

The only way to call this intrinsic is to use the core_intrinsics feature. We do not have a wrapper for these like the atomic intrinsics.

nnethercote · 2024-02-19T06:14:40Z

compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp

@@ -417,8 +417,7 @@ extern "C" LLVMAttributeRef LLVMRustCreateMemoryEffectsAttr(LLVMContextRef C,
      report_fatal_error("bad MemoryEffects.");
  }
 }
-
-// Enable a fast-math flag
+// Enable all fast-math flags
 //
 // https://llvm.org/docs/LangRef.html#fast-math-flags
 extern "C" void LLVMRustSetFastMath(LLVMValueRef V) {


Do we still need the fast intrinsics?

I'd prefer to keep them for now and let people play with both variants. I hope that based on experience we can make an argument that the unsafety of the unsafe ones is not worth the optimizations that they unlock, but I have no data to back that up.

nnethercote · 2024-02-19T06:16:16Z

I am very much the opposite of a floating point expert, but you've been bounced around various reviewers so I'll do my best to review this and allow progress.

In general, adding safer variants of FP intrinsics seems fine, as does converting some existing intrinsics to use them when there are known bugs (#120720) with the less safe variants. I've asked some questions above just to give myself a bit more certainty about this change.

My final questions here are about the exact meaning of "intrinsics". I think these new algebraic intrinsics are internal only? And they are used to implement simd_reduce_{add,mul}_unordered, which are also internal-only intrinsics? And they are used to implement the externally visible _mm512_reduce_add_pd intrinsic mentioned in #120720? Though I see no mention of _mm512_reduce_add_pd anywhere within the rust-lang/rust repository, so I'm confused by that.

nnethercote · 2024-02-19T21:01:34Z

Thanks for making the changes. I particular like the longer comments you added. r=me once the last two nits are addressed.

@bors delegate=saethlin

bors · 2024-02-19T21:01:37Z

✌️ @saethlin, you can now approve this pull request!

If @nnethercote told you to "r=me" after making some further change, please make that change, then do @bors r=@nnethercote

saethlin · 2024-02-19T21:09:26Z

@bors r=nnethercote

bors · 2024-02-19T21:09:28Z

📌 Commit 41fddb5 has been approved by nnethercote

It is now in the queue for this repository.

…nethercote Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison. And this uses the algebraic float ops to fix rust-lang#120720 cc `@orlp`

Rollup of 8 pull requests Successful merges: - rust-lang#120718 (Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison) - rust-lang#121195 (unstable-book: Separate testing and production sanitizers) - rust-lang#121205 (Merge `CompilerError::CompilationFailed` and `CompilerError::ICE`.) - rust-lang#121233 (Move the extra directives for `Mode::CoverageRun` into `iter_header`) - rust-lang#121256 (Allow AST and HIR visitors to return `ControlFlow`) - rust-lang#121307 (Drive-by `DUMMY_SP` -> `Span` and fmt changes) - rust-lang#121310 (Remove an old hack for rustdoc) - rust-lang#121311 (Make `is_nonoverlapping` `#[inline]`) r? `@ghost` `@rustbot` modify labels: rollup

saethlin · 2024-02-20T05:03:30Z

@bors r-
#121320 (comment)
The test should be only-x86_64

saethlin · 2024-02-20T18:14:49Z

@bors r=nnethercote

bors · 2024-02-20T18:14:51Z

📌 Commit cc73b71 has been approved by nnethercote

It is now in the queue for this repository.

bors · 2024-02-21T09:43:36Z

⌛ Testing commit cc73b71 with merge bb8b11e...

bors · 2024-02-21T12:08:16Z

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing bb8b11e to master...

rust-timer · 2024-02-21T13:22:22Z

Finished benchmarking commit (bb8b11e): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.4%	[0.2%, 3.4%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.2%	[-0.2%, -0.2%]	1
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.8%	[2.8%, 2.8%]	1
Regressions ❌ (secondary)	3.4%	[1.1%, 6.5%]	3
Improvements ✅ (primary)	-2.4%	[-3.8%, -1.0%]	2
Improvements ✅ (secondary)	-2.7%	[-5.3%, -0.9%]	9
All ❌✅ (primary)	-0.7%	[-3.8%, 2.8%]	3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 651.363s -> 651.642s (0.04%)
Artifact size: 310.99 MiB -> 311.03 MiB (0.01%)

…thercote Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison. And this uses the algebraic float ops to fix rust-lang#120720 cc `@orlp`

rustbot assigned cuviper Feb 6, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 6, 2024

saethlin added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Feb 6, 2024

saethlin force-pushed the reasonable-fast-math branch from 03e0aa5 to 744641f Compare February 6, 2024 21:56

saethlin changed the title ~~Set only fast-math flags which don't produce poison~~ Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison Feb 6, 2024

saethlin force-pushed the reasonable-fast-math branch from 744641f to 6342116 Compare February 6, 2024 22:41

saethlin marked this pull request as ready for review February 7, 2024 02:09

saethlin added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Feb 7, 2024

bjorn3 reviewed Feb 7, 2024

View reviewed changes

compiler/rustc_codegen_ssa/src/mir/intrinsic.rs Show resolved Hide resolved

saethlin force-pushed the reasonable-fast-math branch from 6342116 to 27db5bc Compare February 7, 2024 17:03

quaternic mentioned this pull request Feb 8, 2024

Miscompilation in _mm512_reduce_add_pd, assumes values not NaN #120720

Closed

rustbot assigned petrochenkov and unassigned cuviper Feb 15, 2024

saethlin force-pushed the reasonable-fast-math branch from 27db5bc to 740338c Compare February 15, 2024 22:40

saethlin force-pushed the reasonable-fast-math branch from 740338c to d932173 Compare February 16, 2024 15:43

rustbot assigned nnethercote and unassigned petrochenkov Feb 17, 2024

nnethercote reviewed Feb 19, 2024

View reviewed changes

saethlin force-pushed the reasonable-fast-math branch from e27d738 to 41fddb5 Compare February 19, 2024 21:02

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 19, 2024

saethlin mentioned this pull request Feb 20, 2024

Rollup of 8 pull requests #121320

Closed

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 20, 2024

Add "algebraic" versions of the fast-math intrinsics

cc73b71

saethlin force-pushed the reasonable-fast-math branch from 41fddb5 to cc73b71 Compare February 20, 2024 17:39

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 20, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 21, 2024

bors merged commit bb8b11e into rust-lang:master Feb 21, 2024
12 checks passed

rustbot added this to the 1.78.0 milestone Feb 21, 2024

bors mentioned this pull request Feb 21, 2024

intrinsics::simd: add missing functions, avoid UB-triggering fast-math #121223

Merged

saethlin deleted the reasonable-fast-math branch May 3, 2024 22:33

saethlin mentioned this pull request May 8, 2024

intrinsics fmuladdf{32,64}: expose llvm.fmuladd.* semantics #124874

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison #120718

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison #120718

saethlin commented Feb 6, 2024 •

edited by RalfJung

Loading

rustbot commented Feb 6, 2024

orlp commented Feb 7, 2024 •

edited

Loading

rustbot commented Feb 7, 2024

rustbot commented Feb 7, 2024

orlp commented Feb 8, 2024

cuviper commented Feb 15, 2024

bors commented Feb 16, 2024

petrochenkov commented Feb 17, 2024

nnethercote Feb 19, 2024

nnethercote Feb 19, 2024

saethlin Feb 19, 2024

nnethercote Feb 19, 2024

orlp Feb 19, 2024

saethlin Feb 19, 2024

nnethercote Feb 19, 2024

saethlin Feb 19, 2024

nnethercote Feb 19, 2024

saethlin Feb 19, 2024

nnethercote Feb 19, 2024

nnethercote commented Feb 19, 2024

nnethercote commented Feb 19, 2024

bors commented Feb 19, 2024

saethlin commented Feb 19, 2024

bors commented Feb 19, 2024

saethlin commented Feb 20, 2024

saethlin commented Feb 20, 2024

bors commented Feb 20, 2024

bors commented Feb 21, 2024

bors commented Feb 21, 2024

rust-timer commented Feb 21, 2024

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison #120718

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison #120718

Conversation

saethlin commented Feb 6, 2024 • edited by RalfJung Loading

rustbot commented Feb 6, 2024

orlp commented Feb 7, 2024 • edited Loading

rustbot commented Feb 7, 2024

rustbot commented Feb 7, 2024

orlp commented Feb 8, 2024

cuviper commented Feb 15, 2024

bors commented Feb 16, 2024

petrochenkov commented Feb 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnethercote commented Feb 19, 2024

nnethercote commented Feb 19, 2024

bors commented Feb 19, 2024

saethlin commented Feb 19, 2024

bors commented Feb 19, 2024

saethlin commented Feb 20, 2024

saethlin commented Feb 20, 2024

bors commented Feb 20, 2024

bors commented Feb 21, 2024

bors commented Feb 21, 2024

rust-timer commented Feb 21, 2024

Overall result: ❌✅ regressions and improvements - no action needed

saethlin commented Feb 6, 2024 •

edited by RalfJung

Loading

orlp commented Feb 7, 2024 •

edited

Loading