Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Following manual_clamp suggestion results in slower code #12826

Closed
okaneco opened this issue May 20, 2024 · 5 comments
Closed

Following manual_clamp suggestion results in slower code #12826

okaneco opened this issue May 20, 2024 · 5 comments
Labels
C-bug Category: Clippy is not doing the correct thing performance-project For issues and PRs related to the Clippy Performance Project

Comments

@okaneco
Copy link

okaneco commented May 20, 2024

Summary

I noticed when clamping and casting from i32 to u8, using clamp(0, 255) as u8 produces unnecessary instructions compared to .max(0).min(255) as u8. If a loop is auto-vectorized, the branches in clamp result in slower code than manual clamping.

I couldn't find a label for this, but it would be akin to I-suggestion-causes-perf-regression.

Currently, the lint is set to warn but following the suggestion inhibits optimization. I don't believe it should fire on the "branchless" patterns which are semantically different.

// 1
input.max(min).min(max)

// 2
let mut x = input;
if x < min { x = min; }
if x > max { x = max; }

Lint Name

manual_clamp

Lint Description

I also had a small issue with the wording in the current description.

Why is this bad?
clamp is much shorter, easier to read, and doesn’t use any control flow.
https://rust-lang.github.io/rust-clippy/master/index.html#/manual_clamp

I slightly disagree with the reasoning here.
I understand the user doesn't have to add any control flow, but the control flow within the clamp implementation is different enough to affect performance in some cases. It is not strictly a "better" clamping method than manually clamping, especially for primitive integers.

Reproducer

#[inline(never)]
pub fn clamp(input: &[i32], output: &mut [u8]) {
    for (&i, o) in input.iter().zip(output.iter_mut()) {
        *o = i.clamp(0, 255) as u8;
    }
}

#[inline(never)]
pub fn manual_clamp(input: &[i32], output: &mut [u8]) {
    for (&i, o) in input.iter().zip(output.iter_mut()) {
        *o = i.max(0).min(255) as u8;
    }
}

Assembly output - https://rust.godbolt.org/z/rdoh97d3v (1.78, but same output on nightly)
The main difference is in the label .LBB0_4 where extra work is being done by the clamp code.

Version

rustc 1.80.0-nightly (d84b90375 2024-05-19)
binary: rustc
commit-hash: d84b9037541f45dc2c52a41d723265af211c0497
commit-date: 2024-05-19
host: x86_64-pc-windows-msvc
release: 1.80.0-nightly
LLVM version: 18.1.4

Additional Labels

No response

@okaneco okaneco added the C-bug Category: Clippy is not doing the correct thing label May 20, 2024
@blyxyas
Copy link
Member

blyxyas commented May 22, 2024

I will look into this from the upstream compiler. This really shouldn't happen.
Thanks for the report ❤️

@blyxyas blyxyas added the performance-project For issues and PRs related to the Clippy Performance Project label May 22, 2024
@okaneco
Copy link
Author

okaneco commented May 22, 2024

Thanks.
I have two examples of real code from the image-webp crate that helped motivate this report.

https://rust.godbolt.org/z/3rnY8d94v
https://rust.godbolt.org/z/53T7n9PGx

@blyxyas
Copy link
Member

blyxyas commented May 23, 2024

I'm currently working on a patch on upstream compiler, it should fix this behaviour (that, more of a bug with Clippy, it's more of a possible optimization with the standard library).

Note that a difference this big only happens with clamp(0, 255), using other ranges like 40, 200 results in a more equal assembly. I'll still open the PR to the standard library.

@blyxyas
Copy link
Member

blyxyas commented May 24, 2024

Okis, the PR has been merged, as that lands we should see an improvement (I'll test in a few on nightly). I think that this PR can now be closed, as the new inline clamp function results in smaller assembly than doing it manually.

What do you think?

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue May 24, 2024
Make `clamp` inline

Context: rust-lang/rust-clippy#12826
This results in slightly more optimized assembly. (And most important, it's now less than lines than just manually clamping a value)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue May 24, 2024
Make `clamp` inline

Context: rust-lang/rust-clippy#12826
This results in slightly more optimized assembly. (And most important, it's now less than lines than just manually clamping a value)
rust-timer added a commit to rust-lang-ci/rust that referenced this issue May 24, 2024
Rollup merge of rust-lang#125455 - blyxyas:opt-clamp, r=joboet

Make `clamp` inline

Context: rust-lang/rust-clippy#12826
This results in slightly more optimized assembly. (And most important, it's now less than lines than just manually clamping a value)
@okaneco
Copy link
Author

okaneco commented May 24, 2024

That sounds good, thanks for doing that.

I reported the "bug" here because when I inlined the clamp definition, it was still producing the selects. I assumed that with the way clamp is currently written, it wouldn't be possible to produce the same output as the manual clamp.
https://rust.godbolt.org/z/Ex4zvExsb

It's definitely more of a performance optimization upstream and probably most related to this specific saturating truncation case. Hopefully clamp and manual clamp can produce equivalent results soon for this.

I agree, it makes more sense to file an issue upstream so it can be tracked and closed there, or closed by regression tests being added if there isn't an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: Clippy is not doing the correct thing performance-project For issues and PRs related to the Clippy Performance Project
Projects
None yet
Development

No branches or pull requests

2 participants