Mixed precision support #247

gflegar · 2019-03-01T18:47:32Z

What we've been thinking about for a while now (and not finding any answers) is how to incorporate mixed precision into Ginkgo. This issue is here to give a place where we can write the issues mixed precision support would introduce and possible ways to fix them. Any contribution to this discussion (including from casual passersby) is highly appreciated. There are two general approaches that we could take to implement this, discussed separately below.

Option 1: Separate type for each parameter of every kernel

Ginkgo already templates the type of low-level performance critical data of all kernels, methods and classes. In theory, we could extend this templating to include a separate type for each parameter. However, doing that would significantly increase the amount of code variations that need to be compiled, since Ginkgo is not a header-only library (and it's not feasible to make it one^†). Specifically, each kernel would need (#compiled types)^{#parameters - 1} times more instantiations than it currently uses.

Implementation of this approach is also difficult in combination with Ginkgo's type erasure features, especially when combined with polymorphism.

Option 2: Use conversions on precision boundaries

Another option would be to implement mixed precision by creating a copy of the object with a different value (or index) type whenever this is needed. In this mode, we only need to provide enhanced conversion methods, which, in addition to converting between formats also convert between types. All object that are not of the same type as the owner of the method would first be temporarily converted to the same type, and after the operation is done, the outputs would be converted back to the original type (similarly to how temporary_clone works.)
There is even a way to reduce the library size and compile times even further (while sacrificing more performance) by doing the conversion in two stages: first only the value array is converted to the resulting type, then the format conversion is called with the same type as both input and output.

Of course, this second approach (temporarily) requires more memory and reduces the performance compared to the first approach, but it at least feasible, and can reasonably support use cases such as mixed precision iterative refinement (MPIR), where the conversion cost is negligible to the total cost of performing the operation in a different precision.

A possible optimization that would help optimize this and avoid to many copies would be to implement copy on write for arrays (#248).

Possible implementation

Here is one way how this approach could be implemented. It involves a lot more macros that I would like to see in C++ code, but at least it effectively removes code duplication and allows to support all type combinations that Ginkgo is compiled for. Basically, we would need an extended version of the GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE that can take a delimiter as a parameter:

#define GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(_macro, _delim, ...) \
    _macro(float, __VA_ARGS__) _delim \
    _macro(double, __VA_ARGS__) _delim \
    _macro(std::complex<float>, __VA_ARGS__) _delim \
    _macro(std::complex<double>, __VA_ARGS__)

Then, code such as this:

//.hpp
template <typename ValueType, typename IndexType>
class MyMatrix : <other interfaces>,
        public ConvertibleTo<MyMatrix<ValueType, IndexType>> {
public:
    void convert_to(MyMatrix<ValueType, IndexType> *) override;
};

//.cpp
template <typename ValueType, typename IndexType>
void MyMatrix<ValueType, IndexType>(MyMatrix<ValueType, IndexType> *)
{ /* implementation */ }

#define DECLARE_MY_MATRIX(Value, Index, ...) class MyMatrix<Value,Index>
GKO_INSTANTIATE_FOR_EACH_VALUE_AND_INDEX_TYPE(DECLARE_MY_MATRIX);

Could be converted into something like this:

//.hpp
#define GKO_COMMA ,
#define GKO_MY_MATRIX_INHERIT_CONVERSIONS(Value, ...) \
    public ConvertibleTo<MyMatrix<Value, IndexType>>

template <typename ValueType, typename IndexType>
class MyMatrix : <other interfaces>,
        GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(
            GKO_MY_MATRIX_INHERIT_CONVERSIONS, GKO_COMMA) {
public:
#define GKO_MY_MATRIX_DECLARE_CONVERSIONS(Value, ...) \
    void convert_to(MyMatrix<Value, IndexType> *) override

    GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(
        GKO_DECLARE_MY_MATRIX_CONVERT_TO_MY_MATRIX, ;);

};

//.cpp
#define GKO_MY_MATRIX_IMPLEMENT_CONVERSIONS(Value, ...) \
    template <typename ValueType, typename IndexType> \
    void MyMatrix<ValueType, IndexType>(MyMatrix<Value, IndexType> *) \
    { /* implementation */ }

GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(
    GKO_MY_MATRIX_IMPLEMENT_CONVERSIONS,);

#define DECLARE_MY_MATRIX(Value, Index, ...) class MyMatrix<Value,Index>
GKO_INSTANTIATE_FOR_EACH_VALUE_AND_INDEX_TYPE(DECLARE_MY_MATRIX);

Of course, this would have to be accompanied with changing the code that for now checks for a hard type (e.g. apply() checks if the input parameter is gko::matrix::Dense<ValueType>) with checking if the type in question can be converted to a correct type.

^† this would mean that the user's code would have to be compiled with one "greatest common divisor" compiler that can incorporate all the features needed to compile all of Ginkgo's code. Which may become and empty set in the future, if we add support for models that are inherently incompatible (e.g. CUDA and SYCL or ROCm).

The text was updated successfully, but these errors were encountered:

gflegar · 2019-03-01T18:48:57Z

I would go for option 2 for now. Mainly because it does not exclude adding features of option 1 later, and there's a clear idea how it could be implemented.

upsj · 2021-04-20T13:26:08Z

This got implemented with #521 #677 and partially #717. Instead of the macro solution, we used a chain of interconvertible types float -> double -> float or std::complex<float> -> std::complex<double> -> std::complex<float> to represent all possible value conversion targets for a type.

gflegar added is:help-wanted Need ideas on how to solve this. mod:core This is related to the core module. labels Mar 1, 2019

gflegar mentioned this issue Mar 1, 2019

Iterative refinement (IR) #243

Merged

upsj closed this as completed Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed precision support #247

Mixed precision support #247

gflegar commented Mar 1, 2019 •

edited

Loading

gflegar commented Mar 1, 2019

upsj commented Apr 20, 2021

Mixed precision support #247

Mixed precision support #247

Comments

gflegar commented Mar 1, 2019 • edited Loading

Option 1: Separate type for each parameter of every kernel

Option 2: Use conversions on precision boundaries

Possible implementation

gflegar commented Mar 1, 2019

upsj commented Apr 20, 2021

gflegar commented Mar 1, 2019 •

edited

Loading