Skip to content

Use 128-bit Widening Multiply on More Platforms #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

CryZe
Copy link

@CryZe CryZe commented Jul 7, 2025

The 128-bit widening multiplication was previously gated by simply checking the target pointer width. This works as a simple heuristic, but a better heuristic can be used:

  1. Most 64-bit architectures except SPARC64 and Wasm64 support the 128-bit widening multiplication, so it shouldn't be used on those two architectures.
  2. The target pointer width doesn't always indicate that we are dealing with a 64-bit architecture, as there are ABIs that reduce the pointer width, especially on AArch64 and x86-64.
  3. WebAssembly (regardless of pointer width) supports 64-bit to 128-bit widening multiplication with the wide-arithmetic proposal.

The wide-arithmetic proposal is available since the LLVM 20 update and works perfectly for this use case as can be seen here:

https://rust.godbolt.org/z/9jY7fxqxK

Using wasmtime explore, we can see it compiles down to the ideal instructions on x86-64:

mulx rax, rdx, r10
xor rax, rdx

Based on the same change in foldhash.

src/lib.rs Outdated
Comment on lines 237 to 245
#[cfg(not(any(
all(
target_pointer_width = "64",
not(any(target_arch = "sparc64", target_arch = "wasm64")),
),
target_arch = "aarch64",
target_arch = "x86_64",
all(target_family = "wasm", target_feature = "wide-arithmetic"),
)))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YMMV: since both arms should compile fine on both, maybe use cfg! instead of #[cfg]? Then it could be an else here and remove a bunch of duplication, while still optimizing out the unused part. (if const { false } is optimized even in debug mode.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I changed it to a simple if.

The 128-bit widening multiplication was previously gated by simply
checking the target pointer width. This works as a simple heuristic, but
a better heuristic can be used:

1. Most 64-bit architectures except SPARC64 and Wasm64 support the
   128-bit widening multiplication, so it shouldn't be used on those two
   architectures.
2. The target pointer width doesn't always indicate that we are dealing
   with a 64-bit architecture, as there are ABIs that reduce the pointer
   width, especially on AArch64 and x86-64.
3. WebAssembly (regardless of pointer width) supports 64-bit to 128-bit
   widening multiplication with the `wide-arithmetic` proposal.

The `wide-arithmetic` proposal is available since the LLVM 20 update and
works perfectly for this use case as can be seen here:

https://rust.godbolt.org/z/9jY7fxqxK

Using `wasmtime explore`, we can see it compiles down to the ideal
instructions on x86-64:

```nasm
mulx rax, rdx, r10
xor rax, rdx
```

Based on the same change in
[`foldhash`](orlp/foldhash#17).
@CryZe CryZe force-pushed the 128-bit-on-more-platforms branch from 146ff74 to 6849c16 Compare July 7, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants