Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector_algorithms.cpp: Remove the distinction between SSE2 and SSE4.2 #4536

Closed
StephanTLavavej opened this issue Mar 27, 2024 · 7 comments · Fixed by #4550
Closed

vector_algorithms.cpp: Remove the distinction between SSE2 and SSE4.2 #4536

StephanTLavavej opened this issue Mar 27, 2024 · 7 comments · Fixed by #4550
Labels
enhancement Something can be improved fixed Something works now, yay!

Comments

@StephanTLavavej
Copy link
Member

During code review, I've prevented two bugs where usage of post-SSE2 instructions was being incorrectly guarded with _Use_sse2() - see #4384 (comment) and #4495 (comment). This is extremely hazardous, and the correctness of the STL shouldn't depend on whether I've had 270 mg of caffeine every single time I've reviewed a vectorization PR.

At this time, we still need to support the tiny fraction (~0.7%, I've heard) of processors that have SSE2 but not SSE4.2. However, we don't need to extend novel optimizations to them - they were perfectly happy running classic STL algorithms up to 2019.

We should prevent this class of mistakes by removing the distinction between SSE2 and SSE4.2 in vector_algorithms.cpp. That is, we should test for the presence of SSE4.2 only, before attempting to use anything up to and including SSE4.2. (This will supersede the error-prone _Traits::_Sse_available().)

We'll still need a distinction between "SSE4.2 is available" and "AVX2 is available", but I consider this to be much less dangerous, because AVX/AVX2 intrinsics and types are very distinctive.

@StephanTLavavej StephanTLavavej added the enhancement Something can be improved label Mar 27, 2024
@AlexGuteniev
Copy link
Contributor

In this case some algorithms that avoid SSE4.2 may start using it. This allows simplifying/optimizing bitsett vectorization, also maybe some of reversing algos coud be direct pshufb

What is the fraction of SSE4.2 but not AVX?

@StephanTLavavej
Copy link
Member Author

In this case some algorithms that avoid SSE4.2 may start using it.

Ah, even better! 😻

What is the fraction of SSE4.2 but not AVX?

In theory I could get accurate numbers, but I'd have to ask around through several people. The 2024-02 Steam Hardware Survey suggests a lower bound - it says that 0.46% of CPUs don't support SSE4.2 (which indicates that it's a vaguely reasonable lower bound for the numbers across all CPUs that we target, not just performance-minded gamers, given the similarity to the number I heard), while 6.75% don't support AVX2. I'd guess the actual number for us is in the range of 10-20%, so that AVX2 optimizations benefit the vast majority of users, but that we won't be able to assume its existence for a decade.

@AlexGuteniev
Copy link
Contributor

Oh, I see there are still a lot of machines with AVX and not AVX2...
I wanted to know if the SSE code path is still useful at all, but looks like it is.
(It is possible to rewrite some of AVX2 algorithms to use AVX, but let's not do that for various reasons, including the reason of this issue)

@jovibor
Copy link
Contributor

jovibor commented Mar 28, 2024

Oh, I see there are still a lot of machines with AVX and not AVX2...

Exactly.
Ideally vector_algorithms should provide something like (IMO):

if (Has_AVX2()) {
...
} else if (Has_AVX()) {
...
} else if (Has_SSE()) { //All SSEs (2, 3, 4.*).
...
} else { //Scalar.
...
}

AVX512 is out of this equation for the next decade I believe.

image

@StephanTLavavej
Copy link
Member Author

Adding codepaths to distinguish AVX1 from AVX2 raises the same sort of hazards that I'm concerned about with SSE2 versus SSE4.2. Although the AVX1/AVX2 delta is maybe 4% of processors, I think the risk isn't worth it.

@jovibor
Copy link
Contributor

jovibor commented Mar 29, 2024

Although the AVX1/AVX2 delta is maybe 4% of processors, I think the risk isn't worth it.

Then, am I right that your suggestion is:

//vector_algorithms.cpp

if (Has_AVX()) { //Exactly AVX1 and AVX2.
...
} else if (Has_SSE()) { //All SSEs (2, 3, 4.*).
...
} else { //Scalar.
...
}

@StephanTLavavej
Copy link
Member Author

Yes, and we already have these functions (they are properly named _Use_avx2() and _Use_sse42()), so we just need to fuse the _Use_sse2() codepaths:

bool _Use_avx2() noexcept {
return __isa_enabled & (1 << __ISA_AVAILABLE_AVX2);
}
bool _Use_sse42() noexcept {
return __isa_enabled & (1 << __ISA_AVAILABLE_SSE42);
}
bool _Use_sse2() noexcept {
#ifdef _M_IX86
return __isa_enabled & (1 << __ISA_AVAILABLE_SSE2);
#else
return true;
#endif
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Something can be improved fixed Something works now, yay!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants