Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add gather/scatter methods for simd #27

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yaozhongxiao
Copy link

Summary: gather and scatter methods of simd have not supported
those need to be implemented

Test Plan: make test

Summary: gather and scatter methods of simd have not supported
those need to be implemented

Test Plan: make test
Reviewers: chengbin.cb, liangbin.mj, yifeng.dongyifeng, longfei.alf, chuanqi.xcq
Issue: https://aone.alibaba-inc.com/req/32929292
CR: https://code.aone.alibaba-inc.com/cpp_libs/std-simd/codereview/4783624
@yaozhongxiao
Copy link
Author

@mattkretz , I try to support the gather and scatter features. I gave my draft to have a discussion before officially commits, Please do not hesitate to correct and comment, thanks.

Copy link
Member

@mattkretz mattkretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you work here! For the gather & scatter interfaces to be complete we will also need masked gather & scatter overloads. This would make a lot of sense as member functions on where_expression. Which implies it should be member functions in simd as well. Hmm. Matches copy_to and copy_from so yeah... member functions. But we can't do a ctor without a tag type using a reserved name. Or do the static __gather function instead of a ctor.

The story just starts here. For gather & scatter to be done nicely we'll want

void f(std::vector<float>& data, simd<int> i) {
  data[i] = data[i] * 1.5f;
  // or in short
  data[i] *= 1.5f;
}

void g(std::vector<std::array<float, 3>>& data, simd<int> i) {
  data[i][0] = data[i][0] * 1.5f;
  // or in short
  data[i][0] *= 1.5f;
}

to work. There's so much more like vector<struct> and all possible nestings... We can do struct gathers via subscripting with member pointers. (e.g. data[i][&S::a]) And then, of course, once C++ gets real reflection...
But, let's do the low-level interface as a first important step.

Next step: Let's write a paper to the C++ committee to make the low-level gather & scatter interface part of the standard.

experimental/bits/simd_builtin.h Show resolved Hide resolved
experimental/bits/simd_builtin.h Show resolved Hide resolved
experimental/bits/simd_builtin.h Show resolved Hide resolved
// gather constructor
template <typename _Up, typename _Flags>
_GLIBCXX_SIMD_ALWAYS_INLINE
simd(const _Up* __mem, const __int_for_sizeof_t<_Up>* __idx, _Flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This is a non-standard (as not specified in an ISO standard) interface. I believe we therefore should implement gather & scatter interfaces only as free functions (possibly hidden friends) or static member functions. And the function name must be a reserved name.
  • I think we want the index parameter to always be of type simd. Turning an array into simd isn't too hard. And the expectation for gather & scatter is that indexes are computed via simd objects.
  • Do you know whether alignment of the memory pointer matters for any gather / scatter hardware? Because if not then the _Flags parameter is unnecessary. I'd drop it.
  • A constexpr scale parameter has been useful in Vc 1. I'm not sure whether we want it as public API at some point, though.
template <size_t _Scale = 1, typename _From, typename _Idx, typename _IdxAbi>
  _GLIBCXX_SIMD_ALWAYS_INLINE static _GLIBCXX_SIMD_CONSTEXPR
  simd
  __gather(const _From* __mem, const simd<_Idx, _IdxAbi>& __idx, _SizeConstant<_Scale> __scale = {})
  {
    return simd([&](auto __i) { return static_cast<value_type>(__mem[__idx[__i] * __scale]); });
  }

I also just implemented a complete _S_gather implementation for _SimdImplX86 (AVX512 and AVX2) and fallback in _SimdImplBuiltin. I didn't implement anything else, because I only use it internally to implement a lookup table for my std::exp(simd) implementation. It's not pushed anywhere yet. But I'm closing in on getting the patch ready.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. simd construct with memory loading has supported, dose the construct with gather need to be the extension?
  2. In my mind, _Flags for __mem of gather is necessary

Copy link
Author

@yaozhongxiao yaozhongxiao Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have implemented _S_gather, will I work based on your implementation be better?

Comment on lines +398 to +406
template <size_t _Offset>
_GLIBCXX_SIMD_INTRINSIC constexpr auto _M_tuple_at() const
{
if constexpr (_Offset == 0)
return first;
else
return second.template _M_tuple_at<_Offset - simd_size_v<_Tp, _Abi0>>();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the name I expected it to do the same as _M_at. Maybe call it _M_tuple_with_element<N>. However, I think you can't use this function anyway; see below.
Also, what if _Offset > 0 but _Offset < simd_size_v<_Tp, _Abi0>?

Comment on lines +1356 to +1366
template <typename _Tp, typename _Up>
static inline _SimdMember<_Tp>
_S_gather(const _Up* __mem, const _SimdMember<_Tp>& __idx,
_TypeTag<_Tp>) noexcept
{
return _SimdMember<_Tp>::_S_generate([&](auto __meta) {
return __meta._S_gather(
__mem, __idx.template _M_tuple_at<__meta._S_offset>(),
_TypeTag<_Tp>());
});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend the types like this? I.e. the value_type of the index vector equals the value_type of the returned / gathered vector? I.e. you can't gather _SimdMember<float> unless __idx is of type _SimdMember<float>. I'd expect int for the latter.
Once you allow different types for gathered vector and index vector you will get size mismatches: e.g. gather floats using 64-bit int index vector. Or double / int, or float / int on AVX w/o AVX2. Then the _M_tuple_at access will fail. I hate this case, but I already handled it for some of the math functions. You should be able to find inspiration there. 😉

Comment on lines +1383 to +1384
_S_scatter(const _SimdMember<_Tp>& __v, _Up* __mem,
const _SimdMember<_Tp>& __idx, _TypeTag<_Tp>) noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same type question as above. Did you want different types for __v and __idx?

Copy link
Author

@yaozhongxiao yaozhongxiao Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I try to support different types in the following manner, but it seems that is not best way after reading our comments.
static inline _SimdMember<_Tp>
_S_gather(const _Up* __mem, const _SimdMember<__int_for_sizeof_t<_Tp>>& __idx,
_TypeTag<_Tp>) noexcept

@yaozhongxiao
Copy link
Author

yaozhongxiao commented Feb 23, 2021

As I implemented in RFC, how to represent the index is one of the important and confusing things to me.
The index can be an array or a simd, but if it is a simd, it needs to be restricted by type size.
I try to define the following pattern, but I think there is a better way that I have not got.

static inline _SimdMember<_Tp>
_S_gather(const _Up* __mem, const _SimdMember<__int_for_sizeof_t<_Tp>>& __idx,
		_TypeTag<_Tp>) noexcept

On the other hand, the index works as Tp* __idx can avoid splitting into simd instances.

static inline _SimdMember<_Tp>
_S_gather(const _Up* __mem, _int_for_sizeof_t<_Tp>* __idx,
		_TypeTag<_Tp>) noexcept

As you pointed out, the first thing goes first, the gather & scatter interface is the best start point.
are there anythings I can do for you to support the gather & scatter features?

@mattkretz
Copy link
Member

The internal API doesn't have to be as precise as the public API. For _SimdImplX86 I added the following function:

    template <typename _Tp, typename _IdxV>                                                         
      _GLIBCXX_SIMD_INTRINSIC static                                                                
      _SimdWrapper<_Tp, _S_size<_Tp>>                                                               
      _S_gather(const _Tp* __mem, const _IdxV& __idxv)

i.e. the function signature allows any type for the index vector. That's obviously not true. The first thing the implementation does is __to_intrin(__as_vector(__idxv)). That'll fail unless _IdxV is simd, _SimdWrapper, or __vector_type_t. Meaning _SimdTuple support is still TODO.

What we really need to care about is the public API. The index will probably have to be a constrained deduced type like simd<_IdxT, _IdxAbi> with enable_if_t<size() == simd_size_v<_IdxT, _IdxAbi>> on the return type. For Vc 1.x I constrained the index parameter type to be anything with a subscript operator and an integral type after subscripting (IIRC). Thus, one could use simd, but also vector and array. If anything, I'd add an overload with a statically sized std::span. Though I'm not sure it's different enough from just loading the indexes into a simd to warrant the extra code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants