Specialize HIP cooperative groups for HCC and NVCC #443

upsj · 2020-01-09T09:55:04Z

This adds a lane mask to the thread_block_tile class and uses it to
implement CUDA and HCC-specific versions of register shuffles, ballot,
all and any.

Nice side-effect: All the deprecation warnings from NVCC should disappear.

TODO:

Figure out how to test this
Remove grep for .sync from CI config
~~Check if this significantly impacts the performance of warp sorting on NVIDIA GPUs~~
(We are actually now almost executing the same code for HIP and CUDA, and I could not observe any runtime issues there. In case there were any, we could just set the tiled partition size to config::warp_size. In the long run, it might be interesting to specialize this to independent subwarps, but not for now.)

yhmtsai · 2020-01-13T22:32:27Z

Could you also add the same test into cuda?
To make sure it is same behavior as cuda's original implementation?

hip/components/cooperative_groups.hip.hpp

hip/test/components/cooperative_groups.hip.cpp

yhmtsai · 2020-01-14T09:54:37Z

hip/test/components/cooperative_groups.hip.cpp

+    auto i = int(group.thread_rank());
+    test_assert(s, group.shfl_up(i, 1) == max(0, i - 1));
+    test_assert(s, group.shfl_down(i, 1) == min(i + 1, config::warp_size - 1));
+    test_assert(s, group.shfl(i, 0) == 0);


Could you add the complex test for the extend_shuffle?
It need a lot of threads to test it. (in #439, it works on double with extended shuffle (disabling the official ones)
It would be nice to check whether it also works on complex double as expected

upsj · 2020-01-14T10:14:15Z

Could you also add the same test into cuda?
To make sure it is same behavior as cuda's original implementation?

I added the corresponding tests. Concerning the extended_shuffle tests: You are probably more familiar with the specifics, can you try to add them?

yhmtsai

LGTM, could you add the description for the __ballot?
Sorry about re-run one test. I want to see the log but click the wrong button.
I will do the shfl test in another PR such that this PR is not blocked.

upsj · 2020-01-15T15:18:00Z

LGTM, could you add the description for the __ballot?

do you mean documentation for cooperative_groups.hip.hpp or comments detailing what the tests do?

yhmtsai · 2020-01-15T15:22:34Z

I mean the documentation for cooperative_groups.hip.hpp __ballot

thoasm

LGTM!

cuda/test/components/cooperative_groups.cu

This adds a lane mask to the thread_block_tile class and uses it to implement CUDA and HCC-specific versions of register shuffles, ballot, all and any.

sonarcloud · 2020-01-16T01:11:34Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities (and 0 Security Hotspots to review)
0 Code Smells

No Coverage information
0.0% Duplication

The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.2.0. This release brings full HIP support to Ginkgo, new preconditioners (ParILUT, ISAI), conversion between double and float for all LinOps, and many more features and fixes. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and CygWin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or CygWin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). # Additions Here are the main additions to the Ginkgo library. Other thematic additions are listed below. + Add full HIP support to Ginkgo [#344](#344), [#357](#357), [#384](#384), [#373](#373), [#391](#391), [#396](#396), [#395](#395), [#393](#393), [#404](#404), [#439](#439), [#443](#443), [#567](#567) + Add a new ISAI preconditioner [#489](#489), [#502](#502), [#512](#512), [#508](#508), [#520](#520) + Add support for ParILUT and ParICT factorization with ILU preconditioners [#400](#400) + Add a new BiCG solver [#438](#438) + Add a new permutation matrix format [#352](#352), [#469](#469) + Add CSR SpGEMM support [#386](#386), [#398](#398), [#418](#418), [#457](#457) + Add CSR SpGEAM support [#556](#556) + Make all solvers and preconditioners transposable [#535](#535) + Add CsrBuilder and CooBuilder for intrusive access to matrix arrays [#437](#437) + Add a standard-compliant allocator based on the Executors [#504](#504) + Support conversions for all LinOp between double and float [#521](#521) + Add a new boolean to the CUDA and HIP executors to control DeviceReset (default off) [#557](#557) + Add a relaxation factor to IR to represent Richardson Relaxation [#574](#574) + Add two new stopping criteria, for relative (to `norm(b)`) and absolute residual norm [#577](#577) ### Example additions + Templatize all examples to simplify changing the precision [#513](#513) + Add a new adaptive precision block-Jacobi example [#507](#507) + Add a new IR example [#522](#522) + Add a new Mixed Precision Iterative Refinement example [#525](#525) + Add a new example on iterative trisolves in ILU preconditioning [#526](#526), [#536](#536), [#550](#550) ### Compilation and library changes + Auto-detect compilation settings based on environment [#435](#435), [#537](#537) + Add SONAME to shared libraries [#524](#524) + Add clang-cuda support [#543](#543) ### Other additions + Add sorting, searching and merging kernels for GPUs [#403](#403), [#428](#428), [#417](#417), [#455](#455) + Add `gko::as` support for smart pointers [#493](#493) + Add setters and getters for criterion factories [#527](#527) + Add a new method to check whether a solver uses `x` as an initial guess [#531](#531) + Add contribution guidelines [#549](#549) # Fixes ### Algorithms + Improve the classical CSR strategy's performance [#401](#401) + Improve the CSR automatical strategy [#407](#407), [#559](#559) + Memory, speed improvements to the ELL kernel [#411](#411) + Multiple improvements and fixes to ParILU [#419](#419), [#427](#427), [#429](#429), [#456](#456), [#544](#544) + Fix multiple issues with GMRES [#481](#481), [#523](#523), [#575](#575) + Optimize OpenMP matrix conversions [#505](#505) + Ensure the linearity of the ILU preconditioner [#506](#506) + Fix IR's use of the advanced apply [#522](#522) + Fix empty matrices conversions and add tests [#560](#560) ### Other core functionalities + Fix complex number support in our math header [#410](#410) + Fix CUDA compatibility of the main ginkgo header [#450](#450) + Fix isfinite issues [#465](#465) + Fix the Array::view memory leak and the array/view copy/move [#485](#485) + Fix typos preventing use of some interface functions [#496](#496) + Fix the `gko::dim` to abide to the C++ standard [#498](#498) + Simplify the executor copy interface [#516](#516) + Optimize intermediate storage for Composition [#540](#540) + Provide an initial guess for relevant Compositions [#561](#561) + Better management of nullptr as criterion [#562](#562) + Fix the norm calculations for complex support [#564](#564) ### CUDA and HIP specific + Use the return value of the atomic operations in our wrappers [#405](#405) + Improve the portability of warp lane masks [#422](#422) + Extract thread ID computation into a separate function [#464](#464) + Reorder kernel parameters for consistency [#474](#474) + Fix the use of `pragma unroll` in HIP [#492](#492) ### Other + Fix the Ginkgo CMake installation files [#414](#414), [#553](#553) + Fix the Windows compilation [#415](#415) + Always use demangled types in error messages [#434](#434), [#486](#486) + Add CUDA header dependency to appropriate tests [#452](#452) + Fix several sonarqube or compilation warnings [#453](#453), [#463](#463), [#532](#532), [#569](#569) + Add shuffle tests [#460](#460) + Fix MSVC C2398 error [#490](#490) + Fix missing interface tests in test install [#558](#558) # Tools and ecosystem ### Benchmarks + Add better norm support in the benchmarks [#377](#377) + Add CUDA 10.1 generic SpMV support in benchmarks [#468](#468), [#473](#473) + Add sparse library ILU in benchmarks [#487](#487) + Add overhead benchmarking capacities [#501](#501) + Allow benchmarking from a matrix list file [#503](#503) + Fix benchmarking issue with JSON and non-finite numbers [#514](#514) + Fix benchmark logger crashers with OpenMP [#565](#565) ### CI related + Improvements to the CI setup with HIP compilation [#421](#421), [#466](#466) + Add MacOSX CI support [#470](#470), [#488](#488) + Add Windows CI support [#471](#471), [#488](#488), [#510](#510), [#566](#566) + Use sanitizers instead of valgrind [#476](#476) + Add automatic container generation and update facilities [#499](#499) + Fix the CI parallelism settings [#517](#517), [#538](#538), [#539](#539) + Make the codecov patch check informational [#519](#519) + Add support for LLVM sanitizers with improved thread sanitizer support [#578](#578) ### Test suite + Add an assertion for sparsity pattern equality [#416](#416) + Add core and reference multiprecision tests support [#448](#448) + Speed up GPU tests by avoiding device reset [#467](#467) + Change test matrix location string [#494](#494) ### Other + Add Ginkgo badges from our tools [#413](#413) + Update the `create_new_algorithm.sh` script [#420](#420) + Bump copyright and improve license management [#436](#436), [#433](#433) + Set clang-format minimum requirement [#441](#441), [#484](#484) + Update git-cmake-format [#446](#446), [#484](#484) + Disable the development tools by default [#442](#442) + Add a script for automatic header formatting [#447](#447) + Add GDB pretty printer for `gko::Array` [#509](#509) + Improve compilation speed [#533](#533) + Add editorconfig support [#546](#546) + Add a compile-time check for header self-sufficiency [#552](#552) # Related PR: #583

upsj added 1:ST:WIP This PR is a work in progress. Not ready for review. mod:hip This is related to the HIP module. labels Jan 9, 2020

upsj self-assigned this Jan 9, 2020

upsj force-pushed the hip_cooperative_groups branch 5 times, most recently from 8c87af2 to 1aea047 Compare January 10, 2020 13:37

upsj added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Jan 10, 2020

upsj requested review from yhmtsai, tcojean and pratikvn January 10, 2020 13:37

upsj force-pushed the hip_cooperative_groups branch 2 times, most recently from f633487 to 1771533 Compare January 13, 2020 08:58

yhmtsai requested changes Jan 14, 2020

View reviewed changes

upsj force-pushed the hip_cooperative_groups branch from 0f95ce0 to 07f9257 Compare January 14, 2020 12:39

yhmtsai approved these changes Jan 15, 2020

View reviewed changes

thoasm approved these changes Jan 15, 2020

View reviewed changes

cuda/test/components/cooperative_groups.cu Show resolved Hide resolved

upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Jan 15, 2020

upsj added 5 commits January 15, 2020 23:02

specialize HCC and CUDA cooperative groups

a456e6a

This adds a lane mask to the thread_block_tile class and uses it to implement CUDA and HCC-specific versions of register shuffles, ballot, all and any.

add tests for HIP cooperative_groups

cc320b3

remove unnecessary filtered pattern from CI config

cea5418

reduce overhead of warp-sized cooperative groups

9d5f92b

add corresponding cooperative_groups test for CUDA

1fc5293

upsj added 3 commits January 15, 2020 23:02

add subwarp and diverging tests

2afc254

add documentation

b17f7bd

add missing header

41c6e6f

upsj force-pushed the hip_cooperative_groups branch from 5a87f3e to 41c6e6f Compare January 15, 2020 22:02

upsj merged commit a2a3efd into develop Jan 16, 2020

upsj deleted the hip_cooperative_groups branch January 16, 2020 07:16

tcojean mentioned this pull request Jun 23, 2020

Release/1.2.0 #576

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialize HIP cooperative groups for HCC and NVCC #443

Specialize HIP cooperative groups for HCC and NVCC #443

upsj commented Jan 9, 2020 •

edited

Loading

yhmtsai commented Jan 13, 2020

yhmtsai Jan 14, 2020

upsj commented Jan 14, 2020

yhmtsai left a comment

upsj commented Jan 15, 2020

yhmtsai commented Jan 15, 2020

thoasm left a comment

sonarcloud bot commented Jan 16, 2020

Specialize HIP cooperative groups for HCC and NVCC #443

Specialize HIP cooperative groups for HCC and NVCC #443

Conversation

upsj commented Jan 9, 2020 • edited Loading

yhmtsai commented Jan 13, 2020

yhmtsai Jan 14, 2020

Choose a reason for hiding this comment

upsj commented Jan 14, 2020

yhmtsai left a comment

Choose a reason for hiding this comment

upsj commented Jan 15, 2020

yhmtsai commented Jan 15, 2020

thoasm left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Jan 16, 2020

upsj commented Jan 9, 2020 •

edited

Loading