-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: RCP Thread-safety issues with parallel host backends without setting Teuchos_ENABLE_THREAD_SAFE=ON #11921
Comments
@trilinos/teuchos |
One would have to do some profiling to see the impact of turning on thread safety for RCP, especially in these low-level routines. Otherwise, I don't have a strong opinion one way or the other. |
Related to #11931. |
Even if On the Kokkos side, we can't guarantee that thread-unsafe code works. |
Would a configure-time check in Tpetra requiring Teuchos_ENABLE_THREAD_SAFE=ON if you're using a host-threaded backend work for people? @masterleinad @ndellingwood @bartlettroscoe If someone can think of a code-based way to ensure no RCP capture by kernels, that could work too. |
Sounds reasonable if it doesn't break users' configurations. |
We do need a decision on this: we do not promise not to make a copy of the functor per thread inside backends, and I think it would be a serious implementation constraint issue if we did. |
As per discussions with @crtrott the most reasonable path forward in the short term is: Have Teuchos check to see if a threaded Kokkos backend is enabled and require Teuchos_ENABLE_THREAD_SAFE=ON in that case. Longer term we should consider making Teuchos_ENABLE_THREAD_SAFE=ON the default and make the user turn if off manually if they're sure they don't need it. @bartlettroscoe is the short term plan OK? |
@csiefer2, that sounds reasonable to me. I just wish Trilinos had a comprehensive and robust performance test suite so we could see the performance impact this will have (before we hear it from users if there is an issue with some algorithms). But note that I suspect that a good bit of this performance critical software could be refactored to use semi-persisting associations (using |
@bartlettroscoe I can do a PR and mark you as a reviewer. Our current performance suite is really just designed to spot app-specific regressions, rather than absolutely everything. So, in theory, if this effects apps which are tested we should see performance differences there. You know, in theory ;) |
👍 |
The new test `TrilinosInstallTests_simpleBuildAgainstTrilinos_by_package_build_tree` merged from PR #11863 fails because the subdirs ${CMAKE_CURRENT_BINARY_DIR}/common and ${CMAKE_CURRENT_SOURCE_DIR}/common because this CMakeLists.txt file already sits in the kokkos-kernels/common/ subdir. I don't know why this error did not happen with PR testing for PR #11863 but this is clearly the right thing to do.
…s:develop' (ab899a0). * trilinos-develop: (22 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function Tpetra: Fixing missing HIP tesT ...
…s:develop' (ab899a0). * trilinos-develop: (22 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function Tpetra: Fixing missing HIP tesT ...
…s:develop' (ab899a0). * trilinos-develop: (22 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function Tpetra: Fixing missing HIP tesT ...
@ndellingwood My code change to Teuchos merged. Does this meet your needs? |
@csiefer2 mostly, thanks for working on this! There were some changes to kokkos-kernels that need a matching PR to the kokkos-kernels repo (to avoid clobber with the next release), once that goes in we should be in good shape I enabled |
…s:develop' (ab899a0). * trilinos-develop: (23 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift fastilu: Fix memory leak. TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function ...
…s:develop' (ab899a0). * trilinos-develop: (23 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift fastilu: Fix memory leak. TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function ...
Matching patch with the changes from 21e643d to kokkos-kernels is up kokkos/kokkos-kernels#1854 |
@ndellingwood, sorry I did not follow up on that right after I made the change and thanks for catching this. (Not sure why this was not caught in the testing of PR #11863). |
@bartlettroscoe no worries, thanks for the quick fix with 21e643d 👍 |
kokkos/kokkos-kernels#1854 is merged |
@ndellingwood, can you run |
@bartlettroscoe I don't have the nightly build configured to post to CDash, it's on my list of TODOs. Is it helpful if I point you to the Jenkins build for now? https://jenkins-son.sandia.gov/job/KokkosEco_Trilinos_Weaver_Gcc830_OpenMP_opt/251/consoleFull |
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
This issue was closed due to inactivity for 395 days. |
Bug Report
@trilinos/tpetra
This issue is to relay info from kokkos/kokkos#6082 regarding some RCP thread-safety issues encountered testing Tpetra with Kokkos@develop branch with the OpenMP backend (a couple different testing configurations are posted in the issue)
The problem seems to occur with kernels that capture RCPs, creating multiple copies within OpenMP parallel regions which is not thread-safe without setting
Teuchos_ENABLE_THREAD_SAFE=ON
, e.g.Trilinos/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp
Lines 5893 to 5969 in cd2c0a1
A couple potential options to resolve this could be to set
Teuchos_ENABLE_THREAD_SAFE=ON
as default, or make sure no RCPs are passed to Kokkos' parallel_* kernels, or ?Adding @masterleinad who helped triage this
The text was updated successfully, but these errors were encountered: