Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix build for targets that don't support sse2, sse4.2, or pclmulqdq features #22

Merged
merged 2 commits into from
Nov 23, 2021

Conversation

kevinaboos
Copy link
Contributor

@kevinaboos kevinaboos commented Nov 19, 2021

The current build.rs script emits the crc32fast_stdarchx86 cfg if the version of rustc is greater than 1.27, regardless of the underlying compilation target.

However, when building this crate for targets that don't support the sse2, sse4.2, and pclmulqdq features, the compiler will still emit SSE instructions. This causes one of two very strange problems:

  • At build time, a cryptic LLVM error is thrown: LLVM ERROR: Do not know how to split this operator's operand!, which (I think) occurs because LLVM is not expecting to parse/emit SSE instructions when not configured to do so.
  • At runtime, fatal CPU exceptions occur, e.g., a General Protection Fault or Invalid Opcode on x86, which will crash/hang the machine. This is particularly problematic for embedded systems and OS kernel environments.

This PR adds the correct feature gates to ensure that invalid instructions cannot be emitted, i.e., that SSE instructions are only used if the target machine supports them.

Related: the memchr crate already experienced this problem. Some related issues:

@srijs
Copy link
Owner

srijs commented Nov 19, 2021

Hi, thanks for the report and fix!

FWIW, the actual instruction calls are guarded via cpuid feature checks, so there shouldn't actually be any runtime errors. It is bad however that LLVM would fail at compile time, that's annoying and not expected.

I'm a bit reluctant to directly merge this in, because that seems like it would result in the fast paths only enabled when the build target is explicitly specified to support the instruction sets for the fast path, which may not the default case for all of them (e.g. pclmulqdq or sse4.2 I believe). The original idea here was that LLVM should be able to compile the code correctly, and then it's really only ran when we can determine at runtime that the CPU we're running on supports the instruction sets.

Can you share a little more about the environment you're seeing the error in, so I could try to reproduce and potentially toy around with a few options? For example from the memchr fix, it looks like we may only need to guard based on sse2:

// On targets which don't feature SSE2, this is disabled, as LLVM wouln't know
// how to work with SSE2 operands. Enabling SSE4.2 and AVX on SSE2-only targets
// is not a problem. In that case, the fastest option will be chosen at
// runtime.

@kevinaboos
Copy link
Contributor Author

kevinaboos commented Nov 19, 2021

FWIW, the actual instruction calls are guarded via cpuid feature checks, so there shouldn't actually be any runtime errors. It is bad however that LLVM would fail at compile time, that's annoying and not expected.

Agreed, I'm not sure how I got those runtime errors originally (probably by changing the target but not the OS bootstrap behavior), but the LLVM build-time failure is reproducible.

I'm a bit reluctant to directly merge this in, because that seems like it would result in the fast paths only enabled when the build target is explicitly specified to support the instruction sets for the fast path, which may not the default case for all of them (e.g. pclmulqdq or sse4.2 I believe). The original idea here was that LLVM should be able to compile the code correctly, and then it's really only ran when we can determine at runtime that the CPU we're running on supports the instruction sets.

Understood, and I wouldn't want to jeopardize existing usage by other targets. Another option would be to modify the build.rs script to consider the available target features.

... For example from the memchr fix, it looks like we may only need to guard based on sse2:

Yes, that may be correct, but I wanted to be absolutely certain that all required features were supported by the actual target before using them. I can experiment with various combinations of those 3 feature flags to see whether sse2 encompasses the others sufficiently.

Can you share a little more about the environment you're seeing the error in, so I could try to reproduce and potentially toy around with a few options?

Certainly! I'm building this for Theseus OS, which uses a fairly normal x86_64 compiler target that disables SIMD in favor of soft floating point. This is a common approach in kernels to avoid needing to save all SIMD registers on every context switch.

The target spec file can be found here, note which features are disabled.

The log I get when building this crate for that target:

$ cargo build --no-default-features -Z unstable-options -Z build-std=core,alloc -Z build-std-features=compiler-builtins-mem  --target x86_64-theseus.json
   Compiling crc32fast v1.2.1 (/home/kevin/crc32fast)
LLVM ERROR: Do not know how to split the result of this operator!

error: could not compile `crc32fast`

If that's not enough details to reproduce it, I could create a separate repo that demonstrates the problem too. Just let me know, thanks!

@kevinaboos
Copy link
Contributor Author

kevinaboos commented Nov 19, 2021

Digging deeper into the code, the culprit is that the calculate function is unconditionally instructed to use & emit SIMD instructions by means of the target_feature(enable = ...) block.
This is what likely causes the LLVM error.

There are a few ways to fix this.

  • Don't invoke the calculate() function in State::update() unless either std is active or all of the required sse2, pclmulqdq, and sse4.1 features are enabled.
  • Gate the calculate() function itself with the same cfg statement.
  • Any other form of cfg statements/attributes applied where relevant, e.g., the specialized module itself.

@srijs
Copy link
Owner

srijs commented Nov 21, 2021

Hey, so I'm pretty sure that the runtime-guarding itself is okay; we will only ever return the specialized struct from the constructor if the runtime architecture meets all of our requirements, and the calculate function can only ever be invoked by the impl if the struct is returned. This basically matches the first option you outlined, it's just a little more indirect.

Having played around with a few different combinations of target_feature guards and target specs (based on the file you provided), I think I'd feel comfortable merging this PR if we could restrict the target_feature checks to sse2 only. This seems like to would:

  1. fix your problem
  2. align well with the solution that's landed in memchr
  3. be less likely to regress performance on other target specs

Would that work for you?

@kevinaboos
Copy link
Contributor Author

Hey, so I'm pretty sure that the runtime-guarding itself is okay; we will only ever return the specialized struct from the constructor if the runtime architecture meets all of our requirements, and the calculate function can only ever be invoked by the impl if the struct is returned. This basically matches the first option you outlined, it's just a little more indirect.

I agree, the runtime checking appears to be correct.

Having played around with a few different combinations of target_feature guards and target specs (based on the file you provided), I think I'd feel comfortable merging this PR if we could restrict the target_feature checks to sse2 only. This seems like to would:

  1. fix your problem
  2. align well with the solution that's landed in memchr
  3. be less likely to regress performance on other target specs

Would that work for you?

I'll test this out and let you know if it still builds correctly. Thanks for proposing a new/better fix!

@kevinaboos
Copy link
Contributor Author

Good news, I can confirm that change works for all of my build targets, including soft float, sse, avx, and more.

Thanks for digging into this with me and helping me arrive at the correct fix! I've pushed a commit that gates the specialized module behind only the target_feature = "sse2" cfg.

@kevinaboos
Copy link
Contributor Author

Out of curiosity, were you able to reproduce that LLVM build error?

@srijs
Copy link
Owner

srijs commented Nov 23, 2021

Awesome, thanks so much! I'll merge this in now, and then cut a release later today.

Out of curiosity, were you able to reproduce that LLVM build error?

Yep, was able to reproduce using your target file, thanks again for supplying that!

@srijs srijs merged commit ea0d978 into srijs:master Nov 23, 2021
@kevinaboos
Copy link
Contributor Author

Great! Glad to hear it, thanks again.

@srijs
Copy link
Owner

srijs commented Nov 23, 2021

Published as version 1.2.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants