Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed precision support #247

Closed
gflegar opened this issue Mar 1, 2019 · 2 comments
Closed

Mixed precision support #247

gflegar opened this issue Mar 1, 2019 · 2 comments
Labels
is:help-wanted Need ideas on how to solve this. is:proposal Maybe we should do something this way. mod:core This is related to the core module.

Comments

@gflegar
Copy link
Member

gflegar commented Mar 1, 2019

What we've been thinking about for a while now (and not finding any answers) is how to incorporate mixed precision into Ginkgo. This issue is here to give a place where we can write the issues mixed precision support would introduce and possible ways to fix them. Any contribution to this discussion (including from casual passersby) is highly appreciated. There are two general approaches that we could take to implement this, discussed separately below.

Option 1: Separate type for each parameter of every kernel

Ginkgo already templates the type of low-level performance critical data of all kernels, methods and classes. In theory, we could extend this templating to include a separate type for each parameter. However, doing that would significantly increase the amount of code variations that need to be compiled, since Ginkgo is not a header-only library (and it's not feasible to make it one). Specifically, each kernel would need (#compiled types)#parameters - 1 times more instantiations than it currently uses.

Implementation of this approach is also difficult in combination with Ginkgo's type erasure features, especially when combined with polymorphism.

Option 2: Use conversions on precision boundaries

Another option would be to implement mixed precision by creating a copy of the object with a different value (or index) type whenever this is needed. In this mode, we only need to provide enhanced conversion methods, which, in addition to converting between formats also convert between types. All object that are not of the same type as the owner of the method would first be temporarily converted to the same type, and after the operation is done, the outputs would be converted back to the original type (similarly to how temporary_clone works.)
There is even a way to reduce the library size and compile times even further (while sacrificing more performance) by doing the conversion in two stages: first only the value array is converted to the resulting type, then the format conversion is called with the same type as both input and output.

Of course, this second approach (temporarily) requires more memory and reduces the performance compared to the first approach, but it at least feasible, and can reasonably support use cases such as mixed precision iterative refinement (MPIR), where the conversion cost is negligible to the total cost of performing the operation in a different precision.

A possible optimization that would help optimize this and avoid to many copies would be to implement copy on write for arrays (#248).

Possible implementation

Here is one way how this approach could be implemented. It involves a lot more macros that I would like to see in C++ code, but at least it effectively removes code duplication and allows to support all type combinations that Ginkgo is compiled for. Basically, we would need an extended version of the GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE that can take a delimiter as a parameter:

#define GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(_macro, _delim, ...) \
    _macro(float, __VA_ARGS__) _delim \
    _macro(double, __VA_ARGS__) _delim \
    _macro(std::complex<float>, __VA_ARGS__) _delim \
    _macro(std::complex<double>, __VA_ARGS__)

Then, code such as this:

//.hpp
template <typename ValueType, typename IndexType>
class MyMatrix : <other interfaces>,
        public ConvertibleTo<MyMatrix<ValueType, IndexType>> {
public:
    void convert_to(MyMatrix<ValueType, IndexType> *) override;
};

//.cpp
template <typename ValueType, typename IndexType>
void MyMatrix<ValueType, IndexType>(MyMatrix<ValueType, IndexType> *)
{ /* implementation */ }

#define DECLARE_MY_MATRIX(Value, Index, ...) class MyMatrix<Value,Index>
GKO_INSTANTIATE_FOR_EACH_VALUE_AND_INDEX_TYPE(DECLARE_MY_MATRIX);

Could be converted into something like this:

//.hpp
#define GKO_COMMA ,
#define GKO_MY_MATRIX_INHERIT_CONVERSIONS(Value, ...) \
    public ConvertibleTo<MyMatrix<Value, IndexType>>

template <typename ValueType, typename IndexType>
class MyMatrix : <other interfaces>,
        GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(
            GKO_MY_MATRIX_INHERIT_CONVERSIONS, GKO_COMMA) {
public:
#define GKO_MY_MATRIX_DECLARE_CONVERSIONS(Value, ...) \
    void convert_to(MyMatrix<Value, IndexType> *) override

    GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(
        GKO_DECLARE_MY_MATRIX_CONVERT_TO_MY_MATRIX, ;);

};

//.cpp
#define GKO_MY_MATRIX_IMPLEMENT_CONVERSIONS(Value, ...) \
    template <typename ValueType, typename IndexType> \
    void MyMatrix<ValueType, IndexType>(MyMatrix<Value, IndexType> *) \
    { /* implementation */ }

GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(
    GKO_MY_MATRIX_IMPLEMENT_CONVERSIONS,);

#define DECLARE_MY_MATRIX(Value, Index, ...) class MyMatrix<Value,Index>
GKO_INSTANTIATE_FOR_EACH_VALUE_AND_INDEX_TYPE(DECLARE_MY_MATRIX);

Of course, this would have to be accompanied with changing the code that for now checks for a hard type (e.g. apply() checks if the input parameter is gko::matrix::Dense<ValueType>) with checking if the type in question can be converted to a correct type.


this would mean that the user's code would have to be compiled with one "greatest common divisor" compiler that can incorporate all the features needed to compile all of Ginkgo's code. Which may become and empty set in the future, if we add support for models that are inherently incompatible (e.g. CUDA and SYCL or ROCm).

@gflegar gflegar added is:help-wanted Need ideas on how to solve this. mod:core This is related to the core module. labels Mar 1, 2019
@gflegar
Copy link
Member Author

gflegar commented Mar 1, 2019

I would go for option 2 for now. Mainly because it does not exclude adding features of option 1 later, and there's a clear idea how it could be implemented.

@gflegar gflegar added is:idea Just a thought - if it's good, it could evolve into a proposal. is:proposal Maybe we should do something this way. mod:openmp This is related to the OpenMP module. and removed is:idea Just a thought - if it's good, it could evolve into a proposal. mod:openmp This is related to the OpenMP module. labels Mar 1, 2019
@upsj
Copy link
Member

upsj commented Apr 20, 2021

This got implemented with #521 #677 and partially #717. Instead of the macro solution, we used a chain of interconvertible types float -> double -> float or std::complex<float> -> std::complex<double> -> std::complex<float> to represent all possible value conversion targets for a type.

@upsj upsj closed this as completed Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:help-wanted Need ideas on how to solve this. is:proposal Maybe we should do something this way. mod:core This is related to the core module.
Projects
None yet
Development

No branches or pull requests

2 participants