Skip to content

implement floor and ceil in assembly on i586 #976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

folkertdev
Copy link
Contributor

@folkertdev folkertdev commented Jul 13, 2025

fixes #837

The assembly is based on

Which both state

/*
 * Written by J.T. Conklin <jtc@NetBSD.org>.
 * Public domain.
 */

Which I believe means we're good in terms of licensing.

@tgross35
Copy link
Contributor

This is awesome, thank you for implementing it!

There is one problem, our libm MSRV unfortunately doesn't support naked functions. I'll be bumping it in the near future but I don't think it will be that high; how hard is this to do with inline asm? Probably adds a few instructions because the loads stack access becomes opaque, but that would still be faster than the slow existing implementation.

@tgross35
Copy link
Contributor

Fix for the red CI in #979 btw

@folkertdev
Copy link
Contributor Author

how hard is this to do with inline asm?

Well, I don't know this target and its assembly that well. So using global_asm! would be much easier (for me, anyway) and more straightforward to upgrade once the MSRV does support naked functions.

I think the only downside is that the symbols might be visible from the outside? Is that a problem?

@folkertdev
Copy link
Contributor Author

actually we can prevent even that with a trick that Björn showed me recently using sym

@folkertdev
Copy link
Contributor Author

folkertdev commented Jul 17, 2025

from what I can find this is a 2024 edition thing


error: extern blocks must be unsafe
  --> builtins-shim/../compiler-builtins/src/math/../../../libm/src/math/arch/i586.rs:23:1
   |
23 | / extern "cdecl" {
24 | |     fn ceil_helper(_: f64) -> f64;
25 | |     fn floor_helper(_: f64) -> f64;
26 | | }
   | |_^

but rust 1.63 won't compile unsafe extern right?

@tgross35
Copy link
Contributor

tgross35 commented Jul 17, 2025

I think the only downside is that the symbols might be visible from the outside? Is that a problem?

I think it may be, also conflicts (though we could make it weak). But this should be pretty easy to move to inline asm; I think something like this would work https://rust.godbolt.org/z/P98rKEW7P, which is reasonably close to the original (not sure if there's a way to turn pointers to locals into stack offsets, to avoid the leal s)

@folkertdev
Copy link
Contributor Author

not sure if there's a way to turn pointers to locals into stack offsets, to avoid the leal s

That's probably fine, we can fix that properly once naked functions are available


we now get errors like this?

  stderr ───
    Spaced Musl sinf arg 1/1: 800 iterations (800 total)

    thread 'musl_quickspace_sinf' panicked at libm-test/tests/compare_built_musl.rs:26:50:
    called `Result::unwrap()` on an `Err` value: 
        input:    (-239880100.0,)
        as hex:   (-0x1.c988f4p+27,)
        as bits:  (0xcd64c47a,)
        expected: -0.17352822            -0x1.6362c4p-3 0xbe31b162
        actual:   -0.17353132            -0x1.636464p-3 0xbe31b232

@quaternic
Copy link
Contributor

You're clobbering the saved control word on the stack with the modified one, so it isn't restored correctly.

I think there's an argument for using a fixed control word for the frndint, but I'll have to come back to this later.

Comment on lines +18 to +19
"movw %dx, ({cw_ptr})", // Apply cw
"fldcw ({cw_ptr})", // ...
Copy link
Contributor

@tgross35 tgross35 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yeah, as @quaternic mentioned this needs a second stack slot. As a microopt this could be let mut cw_stash = [0u16; 2]; and then these accesses can be -4({stash_ptr}) so we save a register vs. two different u16 locals

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually even better, let mut cw_stash = MaybeUninit::<[u16; 2]>::uninit(); to save the zero init instruction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this to work

https://rust.godbolt.org/z/44Tjcx3bb

the -4 stuff didn't work (also, for u16, why -4?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace i586 ceil and floor implementations with assembly to fix +/-0
3 participants