Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM miscompiles consecutive half operations by using too much precision on several backends #97975

Open
beetrees opened this issue Jul 7, 2024 · 16 comments

Comments

@beetrees
Copy link
Contributor

beetrees commented Jul 7, 2024

Consider the following IR:

define half @f(half %a, half %b, half %c) {
    %d = fadd half %a, %b
    %e = fadd half %d, %c
    ret half %e
}

On backends without native half support, LLVM generally lowers half by converting to float, performing the desired operation, and then converting back to half. For this to be a valid lowering, a conversion back to f16 must occur after each operation, otherwise the excess precision of float will affect the result. For example 65504.0 + 65504.0 + -65504.0 equals infinity if each operation is done at half precision, but will result in 65504.0 if the intermediate value is not rounded to a half but instead kept as a float. This excess runtime precision can lead to miscompilations similar to #89885.

However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive half operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of half operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling half was added in #94456).

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the main branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.

Related to #97981.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 7, 2024

@llvm/issue-subscribers-backend-powerpc

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ```

On backends without native half support, LLVM generally lowers half by converting to float, performing the desired operation, and then converting back to half. For this to be a valid lowering, a conversion back to f16 must occur after each operation, otherwise the excess precision of float will affect the result. For example 65504.0 + 65504.0 + -65504.0 equals infinity if each operation is done at half precision, but will result in 65504.0 if the intermediate value is not rounded to a half but instead kept as a float. This excess runtime precision can lead to miscompilations similar to #89885.

However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive half operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of half operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling half was added in #94456).

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

  • AVR (avr-unknown-unknown)
  • Hexagon (hexagon-unknown-linux-musl)
  • LoongArch (loongarch64-unknown-linux-gnu)
  • M68k (m68k-unknown-linux-gnu)
  • MIPS (mips64el-unknown-linux-gnuabi64)
  • MSP430 (msp430-none-elf)
  • PowerPC (powerpc64le-unknown-linux-gnu)
  • SPARC (sparc64-unknown-linux-gnu)
  • WASM (wasm32-unknown-wasi): Already reported in #96437

I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the main branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 7, 2024

@llvm/issue-subscribers-backend-hexagon

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ```

On backends without native half support, LLVM generally lowers half by converting to float, performing the desired operation, and then converting back to half. For this to be a valid lowering, a conversion back to f16 must occur after each operation, otherwise the excess precision of float will affect the result. For example 65504.0 + 65504.0 + -65504.0 equals infinity if each operation is done at half precision, but will result in 65504.0 if the intermediate value is not rounded to a half but instead kept as a float. This excess runtime precision can lead to miscompilations similar to #89885.

However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive half operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of half operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling half was added in #94456).

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

  • AVR (avr-unknown-unknown)
  • Hexagon (hexagon-unknown-linux-musl)
  • LoongArch (loongarch64-unknown-linux-gnu)
  • M68k (m68k-unknown-linux-gnu)
  • MIPS (mips64el-unknown-linux-gnuabi64)
  • MSP430 (msp430-none-elf)
  • PowerPC (powerpc64le-unknown-linux-gnu)
  • SPARC (sparc64-unknown-linux-gnu)
  • WASM (wasm32-unknown-wasi): Already reported in #96437

I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the main branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 7, 2024

@llvm/issue-subscribers-backend-mips

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ```

On backends without native half support, LLVM generally lowers half by converting to float, performing the desired operation, and then converting back to half. For this to be a valid lowering, a conversion back to f16 must occur after each operation, otherwise the excess precision of float will affect the result. For example 65504.0 + 65504.0 + -65504.0 equals infinity if each operation is done at half precision, but will result in 65504.0 if the intermediate value is not rounded to a half but instead kept as a float. This excess runtime precision can lead to miscompilations similar to #89885.

However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive half operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of half operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling half was added in #94456).

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

  • AVR (avr-unknown-unknown)
  • Hexagon (hexagon-unknown-linux-musl)
  • LoongArch (loongarch64-unknown-linux-gnu)
  • M68k (m68k-unknown-linux-gnu)
  • MIPS (mips64el-unknown-linux-gnuabi64)
  • MSP430 (msp430-none-elf)
  • PowerPC (powerpc64le-unknown-linux-gnu)
  • SPARC (sparc64-unknown-linux-gnu)
  • WASM (wasm32-unknown-wasi): Already reported in #96437

I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the main branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 7, 2024

@llvm/issue-subscribers-lld-wasm

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ```

On backends without native half support, LLVM generally lowers half by converting to float, performing the desired operation, and then converting back to half. For this to be a valid lowering, a conversion back to f16 must occur after each operation, otherwise the excess precision of float will affect the result. For example 65504.0 + 65504.0 + -65504.0 equals infinity if each operation is done at half precision, but will result in 65504.0 if the intermediate value is not rounded to a half but instead kept as a float. This excess runtime precision can lead to miscompilations similar to #89885.

However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive half operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of half operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling half was added in #94456).

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

  • AVR (avr-unknown-unknown)
  • Hexagon (hexagon-unknown-linux-musl)
  • LoongArch (loongarch64-unknown-linux-gnu)
  • M68k (m68k-unknown-linux-gnu)
  • MIPS (mips64el-unknown-linux-gnuabi64)
  • MSP430 (msp430-none-elf)
  • PowerPC (powerpc64le-unknown-linux-gnu)
  • SPARC (sparc64-unknown-linux-gnu)
  • WASM (wasm32-unknown-wasi): Already reported in #96437

I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the main branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 7, 2024

@llvm/issue-subscribers-backend-m68k

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ```

On backends without native half support, LLVM generally lowers half by converting to float, performing the desired operation, and then converting back to half. For this to be a valid lowering, a conversion back to f16 must occur after each operation, otherwise the excess precision of float will affect the result. For example 65504.0 + 65504.0 + -65504.0 equals infinity if each operation is done at half precision, but will result in 65504.0 if the intermediate value is not rounded to a half but instead kept as a float. This excess runtime precision can lead to miscompilations similar to #89885.

However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive half operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of half operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling half was added in #94456).

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

  • AVR (avr-unknown-unknown)
  • Hexagon (hexagon-unknown-linux-musl)
  • LoongArch (loongarch64-unknown-linux-gnu)
  • M68k (m68k-unknown-linux-gnu)
  • MIPS (mips64el-unknown-linux-gnuabi64)
  • MSP430 (msp430-none-elf)
  • PowerPC (powerpc64le-unknown-linux-gnu)
  • SPARC (sparc64-unknown-linux-gnu)
  • WASM (wasm32-unknown-wasi): Already reported in #96437

I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the main branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 7, 2024

@llvm/issue-subscribers-backend-msp430

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ```

On backends without native half support, LLVM generally lowers half by converting to float, performing the desired operation, and then converting back to half. For this to be a valid lowering, a conversion back to f16 must occur after each operation, otherwise the excess precision of float will affect the result. For example 65504.0 + 65504.0 + -65504.0 equals infinity if each operation is done at half precision, but will result in 65504.0 if the intermediate value is not rounded to a half but instead kept as a float. This excess runtime precision can lead to miscompilations similar to #89885.

However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive half operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of half operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling half was added in #94456).

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

  • AVR (avr-unknown-unknown)
  • Hexagon (hexagon-unknown-linux-musl)
  • LoongArch (loongarch64-unknown-linux-gnu)
  • M68k (m68k-unknown-linux-gnu)
  • MIPS (mips64el-unknown-linux-gnuabi64)
  • MSP430 (msp430-none-elf)
  • PowerPC (powerpc64le-unknown-linux-gnu)
  • SPARC (sparc64-unknown-linux-gnu)
  • WASM (wasm32-unknown-wasi): Already reported in #96437

I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the main branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 7, 2024

@llvm/issue-subscribers-backend-webassembly

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ```

On backends without native half support, LLVM generally lowers half by converting to float, performing the desired operation, and then converting back to half. For this to be a valid lowering, a conversion back to f16 must occur after each operation, otherwise the excess precision of float will affect the result. For example 65504.0 + 65504.0 + -65504.0 equals infinity if each operation is done at half precision, but will result in 65504.0 if the intermediate value is not rounded to a half but instead kept as a float. This excess runtime precision can lead to miscompilations similar to #89885.

However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive half operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of half operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling half was added in #94456).

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

  • AVR (avr-unknown-unknown)
  • Hexagon (hexagon-unknown-linux-musl)
  • LoongArch (loongarch64-unknown-linux-gnu)
  • M68k (m68k-unknown-linux-gnu)
  • MIPS (mips64el-unknown-linux-gnuabi64)
  • MSP430 (msp430-none-elf)
  • PowerPC (powerpc64le-unknown-linux-gnu)
  • SPARC (sparc64-unknown-linux-gnu)
  • WASM (wasm32-unknown-wasi): Already reported in #96437

I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the main branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.

@beetrees beetrees changed the title LLVM lowers consecutive half operations with too much precision on several backends LLVM miscompiles consecutive half operations by using too much precision on several backends Jul 7, 2024
@beetrees
Copy link
Contributor Author

beetrees commented Jul 7, 2024

I'm not sure that the missed-optimization label is correct: not truncating to half after each operation (the current behaviour) is presumably faster than doing the extra work; the issue here is that the current lowering is a miscompilation that violates the semantics of LLVM IR.

@vchuravy
Copy link
Contributor

vchuravy commented Jul 8, 2024

In Julia we handle this by inserting manual conversion operations: https://github.com/JuliaLang/julia/blob/master/src/llvm-demote-float16.cpp

While I am not a fan of this I think the current state is that this is a "frontend" problem, and not a backend issue.

GCC and Clang only recently introduced -fexcess-precision=16
I believe it was previously argued that the C standard allows for the widening of floating point operations with extended precision (the whole x87 debacle).

85d049a

@beetrees
Copy link
Contributor Author

beetrees commented Jul 8, 2024

Nice workaround! However I think it's fair to say that frontends shouldn't be required to insert custom passes late in the LLVM optimisation pipeline in order for LLVM to not miscompile basic floating point operations. As the comment at the top of the file mentions, it relies on some other optimisation passes (instcombine) not running after it that could undo all its hard work.

LLVM IR requires that the basic floating point operations (fadd etc.) give accurately rounded deteministic results. While the LangRef itself is a bit ambiguous in this regard (see #60942), this determinism is relied upon by LLVM IR optimisations, and backends not following this requirement can lead to miscompilations that segfault or worse (see #89885 for an example using the x87 floating point bugs). What the C standard allows is relevant to C frontends like clang (which will explicitly fpext to float if it wants to use excess precision for _Float16 operations), but LLVM IR has its own independently-specified semantics. In fact, -fexcess-precision=16 relies on these LLVM IR semantics to work.

Given that this appears to have been fixed for 32-bit ARM in #80440, it seems that most of the fix will be changing softPromoteHalfType() to return true for these targets. The documentation for the function even explicitly states that returning false will lead to the miscompilations.

@beetrees
Copy link
Contributor Author

beetrees commented Jul 8, 2024

@vchuravy I also notice the Julia pass also handles bfloat. Out of interest, do you know if there any targets where this is known to be an issue? The LLVM source code (here) appears to indicate that soft promotion is always used for bfloat, and spot checking a bunch of targets in compiler explorer I haven't been able to find any targets where there are excess precision bugs.

@vchuravy
Copy link
Contributor

vchuravy commented Jul 9, 2024

Out of interest, do you know if there any targets where this is known to be an issue?

@maleadt might remember, I think our experience with Float16 made us not trust LLVM handling on platforms that don't support BFloat16 and so we just extended our support there.

@jcranmer-intel
Copy link
Contributor

GCC and Clang only recently introduced -fexcess-precision=16 I believe it was previously argued that the C standard allows for the widening of floating point operations with extended precision (the whole x87 debacle).

LLVM IR semantics for the floating-point types are that, e.g., fadd half means "do IEEE-754-precise binary16 addition", effectively as if FLT_EVAL_METHOD were 0. Deviations from that model are bugs that should be fixed; for the types like half that can be emulated in larger types without double rounding issues, that's not a difficult aspect; the x87 issue (where x87_fp80 induces double-rounding if you try to emulate double with forced-conversion) is much more difficult to fix, although still acknowledged to be a bug (see #44218).

The emulation of half via float happens during CodeGen legalization, which isn't a topic I'm terribly familiar with. Off-hand, with a little spelunking, it looks like @topperc implemented the correct lowering of half-via-float emulation in https://reviews.llvm.org/D73749 with a per-target opt-in mechanism, and this has been opted into by x86, ARM, AArch64, and RISC-V. This means that the default handling of half emulation appears to be wrong, which is worrying to me (especially as it means it's likely to be wrong for any other necessary smaller-via-larger-float emulation targets). On the other hand, it does suggest that there's a relatively easy path to fixing this.

@topperc
Copy link
Collaborator

topperc commented Jul 9, 2024

The emulation of half via float happens during CodeGen legalization, which isn't a topic I'm terribly familiar with. Off-hand, with a little spelunking, it looks like @topperc implemented the correct lowering of half-via-float emulation in https://reviews.llvm.org/D73749 with a per-target opt-in mechanism, and this has been opted into by x86, ARM, AArch64, and RISC-V.

That is indeed that patch. At the time, I couldn't afford to spend the time fixing every target so I didn't try. Thus the opt-in mechanism.

This means that the default handling of half emulation appears to be wrong, which is worrying to me (especially as it means it's likely to be wrong for any other necessary smaller-via-larger-float emulation targets)

I don't think we use the broken TypePromoteFloat handling for anything other than f16. bf16 defaults to TypeSoftPromoteHalf.

Maybe I'll try changing the default for f16 for all targets and see what happens.

@maleadt
Copy link
Contributor

maleadt commented Jul 12, 2024

Out of interest, do you know if there any targets where this is known to be an issue?

@maleadt might remember, I think our experience with Float16 made us not trust LLVM handling on platforms that don't support BFloat16 and so we just extended our support there.

Yeah, because of the uncertainty how LLVM would treat these operations we inserted the conversions to make sure the results can always be trusted (see JuliaLang/julia#51470 (comment) and the comments below).

@beetrees
Copy link
Contributor Author

beetrees commented Aug 3, 2024

I have confirmed that the experimental C-SKY and Xtensa backends appear to also have this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants