`vector_algorithms.cpp`: Remove the distinction between SSE2 and SSE4.2 #4536

StephanTLavavej · 2024-03-27T23:13:05Z

During code review, I've prevented two bugs where usage of post-SSE2 instructions was being incorrectly guarded with _Use_sse2() - see #4384 (comment) and #4495 (comment). This is extremely hazardous, and the correctness of the STL shouldn't depend on whether I've had 270 mg of caffeine every single time I've reviewed a vectorization PR.

At this time, we still need to support the tiny fraction (~0.7%, I've heard) of processors that have SSE2 but not SSE4.2. However, we don't need to extend novel optimizations to them - they were perfectly happy running classic STL algorithms up to 2019.

We should prevent this class of mistakes by removing the distinction between SSE2 and SSE4.2 in vector_algorithms.cpp. That is, we should test for the presence of SSE4.2 only, before attempting to use anything up to and including SSE4.2. (This will supersede the error-prone _Traits::_Sse_available().)

We'll still need a distinction between "SSE4.2 is available" and "AVX2 is available", but I consider this to be much less dangerous, because AVX/AVX2 intrinsics and types are very distinctive.

The text was updated successfully, but these errors were encountered:

AlexGuteniev · 2024-03-28T04:44:39Z

In this case some algorithms that avoid SSE4.2 may start using it. This allows simplifying/optimizing bitsett vectorization, also maybe some of reversing algos coud be direct pshufb

What is the fraction of SSE4.2 but not AVX?

StephanTLavavej · 2024-03-28T06:11:43Z

In this case some algorithms that avoid SSE4.2 may start using it.

Ah, even better! 😻

What is the fraction of SSE4.2 but not AVX?

In theory I could get accurate numbers, but I'd have to ask around through several people. The 2024-02 Steam Hardware Survey suggests a lower bound - it says that 0.46% of CPUs don't support SSE4.2 (which indicates that it's a vaguely reasonable lower bound for the numbers across all CPUs that we target, not just performance-minded gamers, given the similarity to the number I heard), while 6.75% don't support AVX2. I'd guess the actual number for us is in the range of 10-20%, so that AVX2 optimizations benefit the vast majority of users, but that we won't be able to assume its existence for a decade.

AlexGuteniev · 2024-03-28T12:55:47Z

Oh, I see there are still a lot of machines with AVX and not AVX2...
I wanted to know if the SSE code path is still useful at all, but looks like it is.
(It is possible to rewrite some of AVX2 algorithms to use AVX, but let's not do that for various reasons, including the reason of this issue)

jovibor · 2024-03-28T22:45:31Z

Oh, I see there are still a lot of machines with AVX and not AVX2...

Exactly.
Ideally vector_algorithms should provide something like (IMO):

if (Has_AVX2()) {
...
} else if (Has_AVX()) {
...
} else if (Has_SSE()) { //All SSEs (2, 3, 4.*).
...
} else { //Scalar.
...
}

AVX512 is out of this equation for the next decade I believe.

StephanTLavavej · 2024-03-28T23:55:54Z

Adding codepaths to distinguish AVX1 from AVX2 raises the same sort of hazards that I'm concerned about with SSE2 versus SSE4.2. Although the AVX1/AVX2 delta is maybe 4% of processors, I think the risk isn't worth it.

jovibor · 2024-03-29T00:01:49Z

Although the AVX1/AVX2 delta is maybe 4% of processors, I think the risk isn't worth it.

Then, am I right that your suggestion is:

//vector_algorithms.cpp

if (Has_AVX()) { //Exactly AVX1 and AVX2.
...
} else if (Has_SSE()) { //All SSEs (2, 3, 4.*).
...
} else { //Scalar.
...
}

StephanTLavavej · 2024-03-29T09:03:16Z

Yes, and we already have these functions (they are properly named _Use_avx2() and _Use_sse42()), so we just need to fuse the _Use_sse2() codepaths:

STL/stl/src/vector_algorithms.cpp

Lines 25 to 39 in be81252

    
               bool _Use_avx2() noexcept { 
        
                   return __isa_enabled & (1 << __ISA_AVAILABLE_AVX2); 
        
               } 
        
               bool _Use_sse42() noexcept { 
        
                   return __isa_enabled & (1 << __ISA_AVAILABLE_SSE42); 
        
               } 
        
               bool _Use_sse2() noexcept { 
        
           #ifdef _M_IX86 
        
                   return __isa_enabled & (1 << __ISA_AVAILABLE_SSE2); 
        
           #else 
        
                   return true; 
        
           #endif 
        
               }

StephanTLavavej added the enhancement Something can be improved label Mar 27, 2024

AlexGuteniev mentioned this issue Apr 1, 2024

Manually vectorize for at least SSE4.2 #4550

Merged

StephanTLavavej closed this as completed in #4550 Apr 9, 2024

StephanTLavavej added the fixed Something works now, yay! label Apr 9, 2024

StephanTLavavej mentioned this issue Jun 20, 2024

Build the x86 STL with /arch:SSE2 instead of /arch:IA32 #4741

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`vector_algorithms.cpp`: Remove the distinction between SSE2 and SSE4.2 #4536

`vector_algorithms.cpp`: Remove the distinction between SSE2 and SSE4.2 #4536

StephanTLavavej commented Mar 27, 2024

AlexGuteniev commented Mar 28, 2024

StephanTLavavej commented Mar 28, 2024

AlexGuteniev commented Mar 28, 2024

jovibor commented Mar 28, 2024 •

edited

Loading

StephanTLavavej commented Mar 28, 2024

jovibor commented Mar 29, 2024

StephanTLavavej commented Mar 29, 2024

vector_algorithms.cpp: Remove the distinction between SSE2 and SSE4.2 #4536

vector_algorithms.cpp: Remove the distinction between SSE2 and SSE4.2 #4536

Comments

StephanTLavavej commented Mar 27, 2024

AlexGuteniev commented Mar 28, 2024

StephanTLavavej commented Mar 28, 2024

AlexGuteniev commented Mar 28, 2024

jovibor commented Mar 28, 2024 • edited Loading

StephanTLavavej commented Mar 28, 2024

jovibor commented Mar 29, 2024

StephanTLavavej commented Mar 29, 2024

`vector_algorithms.cpp`: Remove the distinction between SSE2 and SSE4.2 #4536

`vector_algorithms.cpp`: Remove the distinction between SSE2 and SSE4.2 #4536

jovibor commented Mar 28, 2024 •

edited

Loading