Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portable packed SIMD vector types #2948

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

Conversation

hsivonen
Copy link
Member

This RFC extends Rust with portable packed SIMD vector types. The RFC is forked from the closed RFC 2366 by @gnzlbg.

Rendered

@Lokathor
Copy link
Contributor

I have so many, many thoughts, the first and most important of which is:

Not a single SIMD crate has a good and complete version of portable SIMD, and we don't even have full Neon support in Nightly, so this is probably far too early to RFC.

@hsivonen
Copy link
Member Author

Not a single SIMD crate has a good

The key thing that makes the packed_simd design good for the standard library is that it conceptually matches what compiler back ends in general have and LLVM in particular has. It belongs in the standard library unless the stance on what kind of compiler intrinsics can be exposed to the crate ecosystem changes dramatically.

Once this design that conceptually matches what compiler back ends is available in the standard library, more abstractions can be built on top of it in the crate ecosystem.

and complete version of portable SIMD

Nothing ever lands if it's waiting for completeness. As acknowledged by the second paragraph of the RFC, it's OK to add more feature later, but let's get what already exists and works into a form that can be used outside nightly.

and we don't even have full Neon support in Nightly, so this is probably far too early to RFC.

This RFC does not need be blocked on the completion or even stabilization of std::arch for any ISA. It's sufficient that bits that packed_simd itself needs are in std::arch at least on nightly for a given ISA, but that's already the case.

@Lokathor
Copy link
Contributor

So if your position is "here's a junk drawer of stuff that you maybe could use" rather than "here's a complete and ready solution", well that's not wrong I suppose.

I think that if you want to expose LLVM's intrinsic concepts for normal Rust users to build with then that's super cool, but we should do that as directly as possible if that is the goal.

Currently the packed_simd crate is "long-term-blocked" on a lack of Const Generics in Rust. The ideal design (with const generics) would be a breaking change from the current design. For just one particular example: the inner field of the Simd type is commented as something that's supposed to be private but that can't be because the shuffle! macro needs it public. If there was const generics then shuffle could be made into a method and the field could be made private. Obviously we should not stabilize anything that has planned breaking changes in its future.

Speaking personally, when I tried to understand packed_simd enough to use it, I could not get a handle on it because all operations funnel through a single Simd<T> type, and so all methods and traits for all vector lengths of all types end up on a single page. This makes the docs effectively unnavigable. Instead I just used core::arch directly and trusted the auto-vectorizer to not screw up too badly for non-intel platforms.

I'm not at all saying that Rust is fine without portable SIMD. That's important to have in the long term.

I am saying that packed_simd, in its current state, is not what we need.

@hsivonen
Copy link
Member Author

So if your position is "here's a junk drawer of stuff that you maybe could use" rather than "here's a complete and ready solution", well that's not wrong I suppose.

I think that's an inappropriate characterization. Rather, my position is that packed_simd is ready for real-world use, so we should make it available for real-world use and add more stuff later instead of not making it available while figuring out what else is possible to add.

I think that if you want to expose LLVM's intrinsic concepts for normal Rust users to build with then that's super cool, but we should do that as directly as possible if that is the goal.

I want to expose the concept of portable SIMD that LLVM has without exposing LLVM-specific idiosyncrasies but making the concept work well on the Rust level. That's what packed_simd is. Also, I gather than so far rustc developers have resisted exposing LLVM's intrinsics directly.

The ideal design (with const generics) would be a breaking change from the current design.

I understand the desire for ideal design, but real artists ship, and actually being able to use stuff is an important feature. The application programmer should not look at what u8x16, u16x8, etc. desugar into. I'd be fine with some compiler hack to hide what they desugar into for now so that changing the desugaring once const generics arrive wouldn't be a breaking change. (No one should be paying attention to what core::arch::x86_64::__m128i desugars into, either, and that one is on stable.)

Speaking personally, when I tried to understand packed_simd enough to use it, I could not get a handle on it because all operations funnel through a single Simd<T> type

In general, it's not useful to pay attention to Simd<T> but to use u8x16, u16x8, etc. The documentation pages for these look understandable enough to me.

I am saying that packed_simd, in its current state, is not what we need.

Implementing portable SIMD is a lot of work that has already been done. I'm a user of portable SIMD, so I care, I'm not going have the bandwidth to implement portable SIMD in a different way. I think we should take the work that has been done instead of blocking it by saying "not like that".

@Ixrec
Copy link
Contributor

Ixrec commented Jun 25, 2020

The other big question I'm not seeing a clear answer to in the RFC is: Why is it important to get this into std in the near future? The Motivation section seems to be entirely about why higher-level crates like packed_simd need to exist, but the fact that packed_simd is usable and useful today doesn't automatically mean it'd benefit from being in std. Even this RFC is citing multiple examples of crates built atop packed_simd, seemingly showing that its vocabulary types are effective as a crate already.

There are a few entries in the FAQ section that touch on this, but they mostly left me confused. For example, this one basically says that packed_simd was "abandoned due to the lack of roadmap commitment from the core stakeholders to get into the standard library," which seems highly misleading at best since it leaves out the whole "waiting for const generics" aspect. Then there's these two which appear to answer my question directly:

Why does this need to go into the standard library instead of living in the crate ecosystem?
The functionality fundamentally relies on the compiler backend (presently LLVM) and, previously, there has been reluctance to expose the kind of compiler intrinsics that packed_simd depends on. Furthermore, the types provided by the crate are important vocabulary types, so asking them to live in the crate ecosystem is similar to asking u32 and u64 to live in the crate ecosystem.

Can't this be implemented in the crate ecosystem on top of std::arch instead of compiler intrinsics?
In theory, yes. In practice, no. It would be a massive undertaking. It makes no sense to redevelop functionality of compiler optimizers and backends that already exist and work just in order to work around Rust's process issues.

But I don't get how promoting packed_simd to std would resolve the disagreement over intrinsics, or over whether packed_simd's vocabulary types are the optimal ones for high-level SIMD, or how it would make the implementation undertaking any less massive. If anything, these seem like additional reasons why we're not ready to put anything high-level into std yet.

@Firstyear
Copy link

From the view of a consumer, who wants to use SIMD in rust, today barriers exist - you choose nightly and packed simd (which from my view, was a great user experience honestly), or you need to use stable with std::arch - reading std::arch's SIMD intrinsics was really daunting and I don't think I'd be able to be effective in using them.

I don't think that packed_simd was confusing to use at all, a good hour with the docs and I had working examples that were able to improve real workloads I have. Like anything it's a slider - do you use the "lowest level" and get all the control? Or do you use a "high level" implementation (ie the faster crate) but lose some of the control? Or worse, rely on autovectorisation and have no control? I think that packed simd appears a good middle ground on this scale. It's low enough to expose the operations that exist on hardware, to solve real problems, but without being so low level that I'm required to read an intel architecture manual.

Currently the packed_simd crate is "long-term-blocked" on a lack of Const Generics in Rust. The ideal design (with const generics) would be a breaking change from the current design. For just one particular example: the inner field of the Simd type is commented as something that's supposed to be private but that can't be because the shuffle! macro needs it public. If there was const generics then shuffle could be made into a method and the field could be made private. Obviously we should not stabilize anything that has planned breaking changes in its future.

Waiting for Rust to be "perfect" before making a change like this, seems like the wrong answer. This is why tools like macro backtrace, and ASAN remain locked behind nightly, despite being hugely useful tools. If the insistence is that shuffle! needs const generics, then why not move shuffle behind a feature flag for nightly, hide the Simd implementation so that it's completely internal to the library, and then allow something useful to be accessible to consumers. When const generics happens, it can be changed internally in the library, and shuffle exposed to users at that point.

From my view, I support this PR and seeing packed_simd become part of std.

@Diggsey
Copy link
Contributor

Diggsey commented Jun 26, 2020

From the view of a consumer, who wants to use SIMD in rust, today barriers exist - you choose nightly and packed simd (which from my view, was a great user experience honestly), or you need to use stable with std::arch - reading std::arch's SIMD intrinsics was really daunting and I don't think I'd be able to be effective in using them.

That's good reason to get more SIMD operations stabilized for other architectures, so that the packed_simd crate can run on stable. I could be mistaken, but it seems like with the new inline assembly set to be stabilized, it might not even be necessary to get every SIMD function stabilized: only the basic register types for each architecture?

There's still no good motivation presented for why the portable-SIMD API needs to be in std though. There is an extremely high bar for APIs in std which is necessary for Rust's long-term commitment to stability.

@hsivonen
Copy link
Member Author

hsivonen commented Jun 26, 2020

The other big question I'm not seeing a clear answer to in the RFC is: Why is it important to get this into std in the near future?

My motivation is that Firefox has been shipping first with simd and then with packed_simd for years, and the apparent abandonment of packed_simd poses a risk to the product I'm working on and that uses packed_simd.

The Rust community reacts with shock every time someone new discovers how this is accomplished by working around Rust's stability story (and, too often, with hostility, too).

The Motivation section seems to be entirely about why higher-level crates like packed_simd need to exist, but the fact that packed_simd is usable and useful today doesn't automatically mean it'd benefit from being in std.

The compilation of packed_simd needs access to the kind of compiler internals that are not exposed on stable. To avoid exposing those compiler internals on stable, it belongs in std for the same reason count_ones() belongs in std.

Even this RFC is citing multiple examples of crates built atop packed_simd, seemingly showing that its vocabulary types are effective as a crate already.

As noted, if you are OK with compiling with nightly features enabled, packed_simd is effective and works.

There are a few entries in the FAQ section that touch on this, but they mostly left me confused. For example, this one basically says that packed_simd was "abandoned due to the lack of roadmap commitment from the core stakeholders to get into the standard library," which seems highly misleading at best since it leaves out the whole "waiting for const generics" aspect.

I guess both reasons apply.

But I don't get how promoting packed_simd to std would resolve the disagreement over intrinsics,

It would resolve it to the same extent as providing count_ones() resolves the issue in the scalar domain by not exposing an LLVM intrinsic but by exposing an operation internally implemented in terms of an LLVM intrinsic.

or over whether packed_simd's vocabulary types are the optimal ones for high-level SIMD,

They are on the same level of abstraction as u32 and u64. They are lower level than bignums are in the scalar domain. They are appropriate for a language that considers it appropriate to have u32 and u64.

I could be mistaken, but it seems like with the new inline assembly set to be stabilized, it might not even be necessary to get every SIMD function stabilized: only the basic register types for each architecture?

The optimizer has less visibility into inline asm than into ISA-specific intrinsics. Both are ISA-spefic and, therefore, don't remove the need for being able to write portable code for SIMD like we get to write portable code for scalars.

There's still no good motivation presented for why the portable-SIMD API needs to be in std though.

It needs to be in std, because it exposes compiler capabilities that can't be accessed (on non-nightly) by the crate ecosystem (unless the compiler team's stance on exposing implementation details of the compiler changes radically).

That is, it needs to be in std for the same reason count_ones() is in std as opposed to being in the crate ecosystem.

@BurntSushi
Copy link
Member

FWIW, the plan was always to put a platform independent SIMD API into std. It cannot live outside of std because its implementation relies on LLVM specific bits. This is not a simple matter of "let's stabilize more intrinsics for other architectures so that people can write things like packed_simd in crates on stable."

(I haven't had a chance to read the RFC yet, but wanted to chime in on at least that point now. I hope to comment more later.)

@hsivonen
Copy link
Member Author

hsivonen commented Jul 2, 2020

I'm concerned the masking semantics are too leaky. m32x4 should not be guaranteed to be 128bits wide.

Not guaranteeing mask type width would be a problem for in terms of zero-cost bitcasts to integer vectors and zero-cost transmute to core::arch types.

Thank you @hsivonen for writing this RFC. It is no minor effort heart

@gnzlbg did the hard work. I'm tying to promote the work and added the FAQ.

@nickwilcox
Copy link

I'm concerned the masking semantics are too leaky. m32x4 should not be guaranteed to be 128bits wide.

Not guaranteeing mask type width would be a problem for in terms of zero-cost bitcasts to integer vectors and zero-cost transmute to core::arch types.

I agree that would be the case. But as proposed, on systems that support EVEX encoding your mask types are non-zero-cost for masking, which I would consider a more import consideration.

In the current draft, how would the m1xN values be created? Would every comparison function have two variants?

@Lokathor
Copy link
Contributor

Lokathor commented Jul 2, 2020

avx512 is quite simply not a portable CPU feature set.

128-bit wide SIMD vector type. How these bits are interpreted depends on the intrinsic
being used. For example, let's sum 8 `f32`s values using the SSE4.1 facilities
in the `std::arch` module. This is one way to do it
([playground](https://play.rust-lang.org/?gist=165e2886b4883ec98d4e8bb4d6a32e22&version=nightly)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not portable.

With portable packed vector types, we can do much better
([playground](https://play.rust-lang.org/?gist=7fb4e3b6c711b5feb35533b50315a5fb&version=nightly)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is not runnable. Maybe it should be packed_simd::f32x4 instead.

Comment on lines +22 to +24
The `std::arch` module exposes architecture-specific SIMD types like `_m128` - a
128-bit wide SIMD vector type. How these bits are interpreted depends on the intrinsic
being used. For example, let's sum 8 `f32`s values using the SSE4.1 facilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect to the point of being misleading.

__m128 (note that it's two leading underscores) as well as the higher bit variants, is explicitly and specifically to be interpreted as f32 lanes.

Similarly, __m128d (and wider variants) are specifically f64 in each lane.

Only the __m128i type (and wider variants) holds integer bits where the exact nature of the integer type changes on an operation-by-operation basis.

@nickwilcox
Copy link

I wanted to go meta on my previous objections to guaranteed mask layout. I feel like the responses got bogged down in the niche-ness of AVX512. I feel the standard library should be extremely conservative in the guarantees it makes. Another RFC can always add more guarantees at a later date if there proves a real need. The eventual RFC guaranteeing the size of bool seems like a good comparison.

To add a another data point (also niche) back to the original argument. The draft spec for the Risc-V packed SIMD has mask registers that don't follow the convention assumed in this RFC. They are the full width, but only use 1 bit per element.

On the implementation side, if this library is a wrapper for LLVM intrinsics, both the fcmp and icmp intrinsics return N x i1. Adding on a stricter requirement to the return type risks non-zero-cost implementation on some ISA in the future.

To address responses from the RFC submitter:

  • Transmute to core::arch types: If the client code wants to perform platform-specific ops then they should already know the platform specific size and format of the mask. The library shouldn't need to guarantee anything for this.
  • Cast to unsigned int vector: The guarantee of a particular bit-pattern can be part From/Into implementation between masks and unsigned vector types.

@newpavlov
Copy link
Contributor

Can you please explain why portable SIMD has to be tied to LLVM intrinsics and not implemented in terms of arch intrinsics? I understand that it will be easier to start with them, but IIUC there are no fundamental blockers for the latter. One problem which comes to mind is that by default we use per-compiled std, so changing target features will not affect portable SIMD code implemented via arch intrinsics as part of std, but I would say it's an argument for keeping portable SIMD in a separate crate outside of std.

Also I think this RFC lacks explanation of how portable SIMD will work with runtime detection. This problem is not an easy one, especially if we want to compile several versions of an algorithm with different target features coming from a separate crate. Ideally I would like to see something like this, but even it does not solve the issue with a separate crate.

@Lokathor
Copy link
Contributor

Lokathor commented Jul 7, 2020

The RFC doesn't lack an explanation of runtime detection, it explicitly rejects the idea: https://github.com/hsivonen/rfcs/blob/ppv/text/0000-ppv.md#shouldnt-the-portable-simd-implementation-detect-instruction-set-extensions-at-run-time-and-adapt

@newpavlov
Copy link
Contributor

The linked part talks about runtime detection at individual operation level, while my post is about algorithm level. I agree that that detection should not be performed for individual operations, since it can seriously degrade performance, but it's a very common expectations for SIMD accelerated code to be able to switch between different implementations at runtime depending on hardware capabilities. So I believe SIMD proposals should keep such use cases in mind.

popcnt is influenced by #[target_feature(enable="..")], so if portable SIMD will be implemented in terms of LLVM intrinsics, it should not be a problem (although note that the result is not ideal, e.g. foo1 does not get inlined and adding -C target-feature=+popcnt does not remove the detection code). But if we are to implement portable SIMD in terms of arch intrinsics, then IIUC it will not work as expected for default build.

@Lokathor
Copy link
Contributor

Lokathor commented Jul 7, 2020

Operations exposed by this low layer of portable SIMD are each too small to use runtime detection. This RFC is not generally proposing any operation big enough to warrant a feature check and then branch to one variant or the other.

Even sin and cos don't really benefit enough from feature variations to offset the cost of the check for a higher feature level.

@hsivonen
Copy link
Member Author

hsivonen commented Aug 5, 2020

Can you please explain why portable SIMD has to be tied to LLVM intrinsics and not implemented in terms of arch intrinsics? I understand that it will be easier to start with them, but IIUC there are no fundamental blockers for the latter.

This is covered in the FAQ. While there is no theoretical obstacle, there's a huge practical obstacle: Rewriting a large chunk of code that LLVM has (and, I believe, GCC has) and Cranelift either has or is going to have makes no sense as a matter of realistic allocation of development effort and its hard to motivate anyone to do the work given that the work could be avoided by delegating to an existing compiler back end.

More bluntly: Those who suggest re-implementing the functionality on top of core::arch haven't yet actually volunteered to do it and shown existence proof of the result. It's far easier to say that the implementation should be done that way than to actually do it. OTOH, the implementation that delegates to an existing compiler back end actually exists.

@bjorn3
Copy link
Member

bjorn3 commented Aug 5, 2020

Basic operations like addition and insertion are currently implemented in corearch using the simd_* platform intrinsics. These get forwarded to the appropriate generic llvm intrinsics by rustc_codegen_llvm. I have implemented most of them in rustc_codegen_cranelift, as there are only like 10 of them that are generic over input and output vector sizes. For the more platform specific simd intrinsics corearch directly calls platform specific llvm intrinsics. These are different for every platform and vector size. corearch uses more than 100 for x86 alone. This means that they have to be reimplemented by the backend for every combination of platform and vector size, while for the simd_* platform intrinsics only a handful have to be implemented to work on every platform with every vector size. (If performance matters, you will still have to handle every platform and vector size combination individually, but when you only want completeness, it is very easy to implement them all)

Preferably the portable packed SIMD vector types only expose operations that can be implemented using simd_* platform intrinsics. This makes it both easier to handle all platforms when implementing it (just call simd_* and don't worry about the current platform) and makes it much easier for alternative codegen backends to support it.

@Lokathor
Copy link
Contributor

Lokathor commented Aug 5, 2020

More bluntly: Those who suggest re-implementing the functionality on top of core::arch haven't yet actually volunteered to do it and shown existence proof of the result.

Actually, I have. I just stopped after f32x4 because my one user didn't need anything else and I got kinda bored with it.

@Firstyear
Copy link

More bluntly: Those who suggest re-implementing the functionality on top of core::arch haven't yet actually volunteered to do it and shown existence proof of the result.

Actually, I have. I just stopped after f32x4 because my one user didn't need anything else and I got kinda bored with it.

This isn't really a constructive piece of input. @hsivonen's point still stands that an alternative doesn't exist especially if you have become bored with the continued development of your version. What is currently offered by packed simd not only exists today, but is in (probably production) use today as mentioned which goes a long way to proving it's value even if not "perfect".

@Lokathor
Copy link
Contributor

Lokathor commented Aug 6, 2020

Rust is not generally in the business of stabilizing things into the standard library before they're ready. This is what keeps our standard library quite high quality. I stand by this policy.

The biggest limit to Stable portable SIMD via explicit intrinsics, in terms of serving x86/x64 and ARM/Aarch64 (the majority of processors in use), is the fact that the ARM intrinsics aren't on Stable, and they aren't even complete on Nightly. They're like 11% complete on Nightly. It's a bit of a drag, but the answer here is to submit PRs to stdarch and add more intrinsics so that we can move ARM intrinsics towards being Stable.

Also, packed_simd can become available on Stable another way: instead of evading the Nightly feature usage by just throwing it into the standard library, consider petitioning the lang team to stabilize all of the individual Nightly features that it needs for it to build on Stable without being in the standard library. There's only like 3 of them, and then the whole Rust world would benefit from the additional language abilities.

@felix91gr
Copy link

I think what @Lokathor says makes a lot of sense:

  • When you stabilize something into the stdlib, you're more or less forced to maintain what you've delivered. Breaking changes are... not a thing there? Maybe when getting into an Edition, but otherwise, no, as far as I'm aware. And you don't want to chain this feature to a lesser version of it just because it was not ready and fully thought and tested through before it got published.

  • The shortest path to using this on Stable is to stabilize the parts that it needs, and not to make it a part of the stdlib.

And to these points, I'd like to add mine:

  • Anything that's good enough, important enough and mainstream enough, will eventually become part of the stdlib. That's how things tend to work, and I think that's great - this means that an amazing library used by everyone could eventually become a standard and be included in the language itself (well the stdlib anyway). This particular feature doesn't need to be a part of the stdlib right away - it can become part of it later. In the future it will probably be completed, polished, and brought to such a good level that it becomes obvious that it needs to be a part of the stdlib. In other words, I think it will be part of the stdlib, but in a farther future than is maybe proposed here.

@KodrAus KodrAus added A-simd SIMD related proposals & ideas Libs-Tracked Libs issues that are tracked on the team's project board. labels Aug 19, 2020
@KodrAus
Copy link
Contributor

KodrAus commented Aug 19, 2020

I'm trying to understand what the real blockers to moving forward with this RFC are. There's an implicit motivation of getting packed_simd's API to build on stable Rust, but the RFC itself is focused on offering a portable SIMD API in std.


How relevant is the question of whether to build on LLVM intrinsics vs std::arch for offering a portable SIMD API in std?

From a public API perspective it seems like that only becomes significant when we're suggesting building on std::arch as a way to get a portable API that compiles on stable Rust that's "only" blocked on a lot of work that's well-defined? If that's the case then it doesn't seem like it's really a blocker for this RFC, since the plan has always been to offer a portable API in std.


The question then is why not this proposed API?

There seems to be a general impression in the comments here that the API proposed is not ideal. What would a more ideal one look like? From what I can see, it looks like the RFC isn't proposing the vector types as an alias of some generic Simd type, so there's no surprises there. There's the potential of improving shuffle with const generics but that doesn't seem like a blocker for std, which already has precedent for hiding unstable details behind macros.

@Lokathor
Copy link
Contributor

the current packed_simd does use a generic SIMD type as the type that backs all other types. So, if this RFC is proposing that we just suddenly accept packed_simd as is, that's what we'd have.


I'd like to note that all of this should hopefully go into core, not just std.

@hsivonen
Copy link
Member Author

I'm trying to understand what the real blockers to moving forward with this RFC are.

AFAICT, the main blocker is that this lacks a champion within the libs team. (I don't have the bandwidth to become that person at this time.)

How relevant is the question of whether to build on LLVM intrinsics vs std::arch for offering a portable SIMD API in std?

As noted earlier, I believe in terms of the amount of work and who is willing to do the work, the only realistic path forward is to use the LLVM (or in the future other back end) facilities as packed_simd currently does. Previous decisions indicate that exposing those LLVM facilities directly is a no-go, as it would tie Rust LLVM. packed_simd is just enough abstraction to allow for other back ends in the future, which is why I think packed_simd itself in the form of core::simd (yes, this would belong in core rather than std) is the right abstraction.

While building on std::arch is theoretically possible, I believe no one is going to volunteer to reimplement things that LLVM already provides for the whole breadth of packed_simd. (A demo of one or two features isn't enough. It's already known that it's theoretically possible. The issue is doing all the work.)

@Lokathor
Copy link
Contributor

I do not believe that it would tie rust particularly to LLVM, it would just tie that one crate using the LLVM intrinsics to LLVM until an alternate backend also had the same intrinsics. Which is arguably less disruptive than what happens with packed_simd in core, because with packed_simd in core now everyone would need to wait on alternative backends supporting all the packed_simd intrinsics, instead of just whoever is uses one particular crate off of crates.io.

Incidentally wide got a new release over the weekend and it now supports all the 128-bit types and for many of their desired ops. Runs on Stable and everything. Sure, it's not immediate 100% coverage of every possible SIMD thing, but there's only so much time in the day so you gotta start somewhere. If there was a more realistic path forward because of being unblocked by the general SIMD blockers I'm pretty sure that it would be easy enough to get additional people interested as well.

@bjorn3
Copy link
Member

bjorn3 commented Aug 19, 2020

I do not believe that it would tie rust particularly to LLVM, it would just tie that one crate using the LLVM intrinsics to LLVM until an alternate backend also had the same intrinsics.

It is very likely to have at least one crate in the dependency graph that needs a certain intrinsic, which is actually quite disruptive. For example the rand crate indirectly depended on ppv-lite86 which always requires certain x86 specific intrinsics on x86 when the simd feature is enabled. Older rand versions always enabled it, but newer versions don't anymore due to some confusion about if simd or no-simd is the default. I recently implemented enough llvm intrinsics in cg_clif to make ppv-lite86 work both with and without simd enabled.

Which is arguably less disruptive than what happens with packed_simd in core, because with packed_simd in core now everyone would need to wait on alternative backends supporting all the packed_simd intrinsics, instead of just whoever is uses one particular crate off of crates.io.

All packed_simd methods are likely to be #[inline]. This causes codegen to be deferred to the calling crate, which means that you only need to support their implementation when called. This is also how I could compile core::arch with cg_clif without having to implement all intrinsics. packed_simd also doesn't use any platform specific llvm intrinsics1, so the amount of intrinsics necessary to support any platform is much smaller. As an additional bonus the non-platform specific llvm intrinsics have consistent names with respect to vector and lane sizes.

@KodrAus
Copy link
Contributor

KodrAus commented Aug 23, 2020

the current packed_simd does use a generic SIMD type as the type that backs all other types. So, if this RFC is proposing that we just suddenly accept packed_simd as is, that's what we'd have.

Ah gotcha 👍 I was just going off the APIs proposed in the RFC itself, which doesn’t mention the Simd type directly.

AFAICT, the main blocker is that this lacks a champion within the libs team. (I don't have the bandwidth to become that person at this time.)

I’d welcome a PR that introduced the API proposed in the RFC as unstable with the vector types expressed as opaque structs instead of type aliases to Simd, so we could spin up a project group around it to help push things forward. I can understand you not having the time to do this yourself. Adding APIs to the standard library is a lot of work, even if it’s mostly a copy+paste from an existing crate. But if anybody did want to give this a shot and wasn’t sure where to start I’d be happy to lend a hand! Does that sound like a reasonable path forward? Would you be open to reviewing PRs @hsivonen and @Lokathor?

@Lokathor
Copy link
Contributor

I could review such PRs.

@KodrAus
Copy link
Contributor

KodrAus commented Aug 28, 2020

I’ve opened #2977 to establish a project group to work on std::simd. Check it out, see what you think of the scope, and please reach out if you’d like to be involved 😃

Thanks everyone for all your effort so far!

@KodrAus
Copy link
Contributor

KodrAus commented Oct 14, 2020

Update on the project group

The Portable SIMD group (@rust-lang/project-portable-simd) has been established and very busy! We’re mostly working in the new https://github.com/rust-lang/stdsimd repository, scaffolding the core::simd API.

We’ve also published a new patched version of packed_simd on crates.io as packed_simd_2 that you can patch in to your existing projects using the package field in the meantime:

[dependencies.packed_simd]
package = "packed_simd_2"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-simd SIMD related proposals & ideas Libs-Tracked Libs issues that are tracked on the team's project board.
Projects
None yet
Development

Successfully merging this pull request may close these issues.