-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono] Adding support for Vector128::ExtractMostSignificantBits intrinsics on amd64 #89997
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I cannot immediately see why LLVM would fail on the i16 case.
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
@jandupej looks like it was caused by missing SSSE3 check which is weird considering SSSE3 is more than 15 years old. |
The failing CI lines are tracked on main and unrelated to this PR. |
If memory serves, we require the CPU to support at least SSE4.1, so this should not be an issue. Still, it is a good practice to check for ISA extension support before using it. |
Adding intrinsics support for
Vector128::ExtractMostSignificantBits
method on amd64 (miniJIT and llvm AOT).Implementation
Extracting the most significant bits (MSBs) from Vector128 on amd64 is based on the use of
_mm_movemask_epi8/ps/pd
(SSE/SSE2).sse_movmsk
: Create mask from the most significant bit of each 8/32/64-bit element (_mm_movemask_epi8/ps/pd
).ssse3_shuffle
: Shuffle 8-bit elements of vector according to shuffle control mask (_mm_shuffle_epi8
).Short
/UShort
element typesSince the
_mm_movemask_epi8/ps/pd
doesn't supportShort
/UShort
element types, we first perform_mm_shuffle_epi8
(SSSE3) to shuffle odd bytes (most significant bytes of eachShort
/UShort
) to the lower half of vector while zeroing out the upper half. Next, we use_mm_movemask_epi8
to extract MSBs from shuffled vector.Other primitive element types
Based on the size of element type, the corresponding version of
_mm_movemask_epi8/ps/pd
is used to extract MSBs.Future work
Emitting intrinsics for Vector128 of floating-point types is currently not supported in Mono. This PR adds the support for emitting it on amd64 platform but additional work must be done for arm64 and possibly for WASM before enabling it for Mono.
Contributes to #76025.