Mixed precision support #247
Labels
is:help-wanted
Need ideas on how to solve this.
is:proposal
Maybe we should do something this way.
mod:core
This is related to the core module.
What we've been thinking about for a while now (and not finding any answers) is how to incorporate mixed precision into Ginkgo. This issue is here to give a place where we can write the issues mixed precision support would introduce and possible ways to fix them. Any contribution to this discussion (including from casual passersby) is highly appreciated. There are two general approaches that we could take to implement this, discussed separately below.
Option 1: Separate type for each parameter of every kernel
Ginkgo already templates the type of low-level performance critical data of all kernels, methods and classes. In theory, we could extend this templating to include a separate type for each parameter. However, doing that would significantly increase the amount of code variations that need to be compiled, since Ginkgo is not a header-only library (and it's not feasible to make it one†). Specifically, each kernel would need (#compiled types)#parameters - 1 times more instantiations than it currently uses.
Implementation of this approach is also difficult in combination with Ginkgo's type erasure features, especially when combined with polymorphism.
Option 2: Use conversions on precision boundaries
Another option would be to implement mixed precision by creating a copy of the object with a different value (or index) type whenever this is needed. In this mode, we only need to provide enhanced conversion methods, which, in addition to converting between formats also convert between types. All object that are not of the same type as the owner of the method would first be temporarily converted to the same type, and after the operation is done, the outputs would be converted back to the original type (similarly to how
temporary_clone
works.)There is even a way to reduce the library size and compile times even further (while sacrificing more performance) by doing the conversion in two stages: first only the value array is converted to the resulting type, then the format conversion is called with the same type as both input and output.
Of course, this second approach (temporarily) requires more memory and reduces the performance compared to the first approach, but it at least feasible, and can reasonably support use cases such as mixed precision iterative refinement (MPIR), where the conversion cost is negligible to the total cost of performing the operation in a different precision.
A possible optimization that would help optimize this and avoid to many copies would be to implement copy on write for arrays (#248).
Possible implementation
Here is one way how this approach could be implemented. It involves a lot more macros that I would like to see in C++ code, but at least it effectively removes code duplication and allows to support all type combinations that Ginkgo is compiled for. Basically, we would need an extended version of the
GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE
that can take a delimiter as a parameter:Then, code such as this:
Could be converted into something like this:
Of course, this would have to be accompanied with changing the code that for now checks for a hard type (e.g.
apply()
checks if the input parameter isgko::matrix::Dense<ValueType>
) with checking if the type in question can be converted to a correct type.† this would mean that the user's code would have to be compiled with one "greatest common divisor" compiler that can incorporate all the features needed to compile all of Ginkgo's code. Which may become and empty set in the future, if we add support for models that are inherently incompatible (e.g. CUDA and SYCL or ROCm).
The text was updated successfully, but these errors were encountered: