Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tribits: PR#11808 breaks kokkos configure with TPL_ENABLE_HWLOC=ON #11938

Closed
maartenarnst opened this issue Jun 1, 2023 · 5 comments
Closed
Labels
type: bug The primary issue is a bug in Trilinos code or tests

Comments

@maartenarnst
Copy link
Contributor

maartenarnst commented Jun 1, 2023

Bug Report

@bartlettroscoe

Description

We're compiling Trilinos with TPL_ENABLE_HWLOC=ON. Configure aborts in Kokkos on line

It seems the issue is due to a change in PR #11808:

It seems the issue can be solved by replacing KOKKOS_TPL_OPTION(HWLOC Off) on line 34 in kokkos_tpls.cmake with KOKKOS_TPL_OPTION(HWLOC Off TRIBITS HWLOC).

@maartenarnst maartenarnst added the type: bug The primary issue is a bug in Trilinos code or tests label Jun 1, 2023
@bartlettroscoe
Copy link
Member

@trilinos/kokkos

@bartlettroscoe
Copy link
Member

CC: @trilinos/kokkos

It seems the issue can be solved by replacing KOKKOS_TPL_OPTION(HWLOC Off) on line 34 in kokkos_tpls.cmake with KOKKOS_TPL_OPTION(HWLOC Off TRIBITS HWLOC).

@maartenarnst, I agree, that does look like the right solution. Not sure how removing subpackages triggered this issue but this is an easy fix (except for having to move the changes back to the native Kokkos repo as well).

I will add a commit the fix this as part of the topic branch in PR #11863.

@maartenarnst
Copy link
Contributor Author

Hi @bartlettroscoe. Ok, thanks a lot for the quick response and for committing the fix!

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 1, 2023
It was reported by a Trilinos user in trilinos#11938 that after removing Kokkos
subpackages that their build of Trilinos failed and by marking 'HWLOC' as a
TriBITS TPL this fixed the problem.  HWLOC has always been a TriBITS TPL so it
is not quite clear how this worked before this.
bartlettroscoe added a commit to bartlettroscoe/kokkos that referenced this issue Jun 1, 2023
It was reported by a Trilinos user in trilinos/Trilinos#11938 that after
removing Kokkos subpackages that their build of Trilinos failed and by marking
'HWLOC' as a TriBITS TPL this fixed the problem.  HWLOC has always been a
TriBITS TPL so it is not quite clear how this worked before this.
@bartlettroscoe
Copy link
Member

bartlettroscoe commented Jun 1, 2023

@maartenarnst, FYI, this should be addressed in Trilinos PR #11863 (commit 6bffd44) and Kokkos PR kokkos/kokkos#6176.

@maartenarnst
Copy link
Contributor Author

Thanks a lot!

jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jun 7, 2023
…s:develop' (ab899a0).

* trilinos-develop: (22 commits)
  Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863)
  Teuchos: Fixing cmake logic
  Teuchos: Fixing catch() issues with C++ language drift
  TrilinosSS: include <omp.h> (Fix trilinos#11867)
  MueLu hierarchical: Fix build error
  Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change
  Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host
  Stokhos:  Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP
  Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545)
  Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938)
  Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808)
  KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545)
  Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545)
  Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545)
  Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157)
  Export Kokkos_ENABLE_<OPTION> that are relevant
  Do not append to Kokkos_OPTIONS variables those in the do not export list
  Expand list of kokkos options not to export with cmake
  Tpetra: Don't use std::binary_function
  Tpetra: Fixing missing HIP tesT
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jun 7, 2023
…s:develop' (ab899a0).

* trilinos-develop: (22 commits)
  Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863)
  Teuchos: Fixing cmake logic
  Teuchos: Fixing catch() issues with C++ language drift
  TrilinosSS: include <omp.h> (Fix trilinos#11867)
  MueLu hierarchical: Fix build error
  Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change
  Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host
  Stokhos:  Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP
  Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545)
  Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938)
  Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808)
  KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545)
  Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545)
  Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545)
  Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157)
  Export Kokkos_ENABLE_<OPTION> that are relevant
  Do not append to Kokkos_OPTIONS variables those in the do not export list
  Expand list of kokkos options not to export with cmake
  Tpetra: Don't use std::binary_function
  Tpetra: Fixing missing HIP tesT
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jun 7, 2023
…s:develop' (ab899a0).

* trilinos-develop: (22 commits)
  Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863)
  Teuchos: Fixing cmake logic
  Teuchos: Fixing catch() issues with C++ language drift
  TrilinosSS: include <omp.h> (Fix trilinos#11867)
  MueLu hierarchical: Fix build error
  Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change
  Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host
  Stokhos:  Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP
  Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545)
  Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938)
  Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808)
  KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545)
  Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545)
  Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545)
  Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157)
  Export Kokkos_ENABLE_<OPTION> that are relevant
  Do not append to Kokkos_OPTIONS variables those in the do not export list
  Expand list of kokkos options not to export with cmake
  Tpetra: Don't use std::binary_function
  Tpetra: Fixing missing HIP tesT
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jun 7, 2023
…s:develop' (ab899a0).

* trilinos-develop: (23 commits)
  Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863)
  Teuchos: Fixing cmake logic
  Teuchos: Fixing catch() issues with C++ language drift
  fastilu: Fix memory leak.
  TrilinosSS: include <omp.h> (Fix trilinos#11867)
  MueLu hierarchical: Fix build error
  Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change
  Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host
  Stokhos:  Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP
  Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545)
  Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938)
  Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808)
  KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545)
  Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545)
  Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545)
  Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157)
  Export Kokkos_ENABLE_<OPTION> that are relevant
  Do not append to Kokkos_OPTIONS variables those in the do not export list
  Expand list of kokkos options not to export with cmake
  Tpetra: Don't use std::binary_function
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jun 7, 2023
…s:develop' (ab899a0).

* trilinos-develop: (23 commits)
  Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863)
  Teuchos: Fixing cmake logic
  Teuchos: Fixing catch() issues with C++ language drift
  fastilu: Fix memory leak.
  TrilinosSS: include <omp.h> (Fix trilinos#11867)
  MueLu hierarchical: Fix build error
  Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change
  Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host
  Stokhos:  Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP
  Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545)
  Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938)
  Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808)
  KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545)
  Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545)
  Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545)
  Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157)
  Export Kokkos_ENABLE_<OPTION> that are relevant
  Do not append to Kokkos_OPTIONS variables those in the do not export list
  Expand list of kokkos options not to export with cmake
  Tpetra: Don't use std::binary_function
  ...
nliber pushed a commit to nliber/kokkos that referenced this issue Jun 22, 2023
It was reported by a Trilinos user in trilinos/Trilinos#11938 that after
removing Kokkos subpackages that their build of Trilinos failed and by marking
'HWLOC' as a TriBITS TPL this fixed the problem.  HWLOC has always been a
TriBITS TPL so it is not quite clear how this worked before this.
etphipp added a commit to sandialabs/GenTen that referenced this issue Jun 26, 2024
1a3ea28 Merge pull request #6231 from ndellingwood/master
3e85bd9 Fix windows symlink configure issue (#6241)
ea7b124 CHANGELOG fixup following merge
25592c5 Update master_history.txt
adde1e6 Merge branch 'release-candidate-4.1.00' for 4.1.00
9e84430 Merge pull request #6228 from masterleinad/cherry_pick_6223
dd81ecb Merge pull request #6223 from masterleinad/fix_simd_on_gpus
5c3e683 [4.1.00] Changelog for 4.1.00 (#6226)
cd96a74 Merge pull request #6219 from masterleinad/fix_sycl_makefile_4_1_00
23aadf4 Fix compiling SYCL with KOKKOS_IMPL_DO_NOT_USE_PRINTF_USAGE
afc1929 Update version to 4.1.00
6ca60c3 Improve OpenMP affinity warning to include MPI concerns (#6185)
e200ba1 [HIP] Improve heuristic deciding the number of blocks used in parallel_reduce (#6160)
43a797b Left align demangled stacktrace output. (#6191)
a406372 Fix global fence in Kokkos::resize(DynRankView) (#6184)
8661773 Merge pull request #6195 from fnrizzi/is_trait_v
98f9b4c add trait and test
e30f040 shortcut value for is_dynamic_view
789b62c Weed out verbose output from `dynamic_view` container unit test (#6173)
e2a7f08 Merge pull request #6171 from rgayatri23/openmptarget_nvhpc
8266abd Merge pull request #6183 from ldh4/simd_replace_unavailable_loadu_storeu_instr
ad966bd OpenMPTarget: include desul changes.
c72615a Merge remote-tracking branch 'upstream/develop' into openmptarget_nvhpc
7b0e378 Replace _mm512_loadu_epi64 and _mm512_storeu_epi64 with _mm512_loadu_si512 and _mm512_storeu_si512
18c5395 Merge pull request #5982 from masterleinad/cleanup_functor_analysis
6c134af Merge pull request #6172 from masterleinad/remove_desul_sycl_extended_namespace
0b7bed5 Allow passing a temporary std::vector to partition_space (#6167)
65ffe4c Also create symlinks for CMake configuration files to cmake_packages/Kokkos for TriBITS (#6163)
915c174 SIMD: make binary op tests to test against all data types (#5913)
62ba94c Merge pull request #6175 from dalg24/changelog_372
502dc03 Merge pull request #6176 from bartlettroscoe/tril-11938-tribits-hwloc
2bc7b96 Clean up FunctorAnalysis
9df5a01 Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos/Trilinos#11938)
1af1379 Cherry-pick v3.7.02 changelog into develop [ci skip]
bf34573 OpenMPTarget: Restore desul changes.
925aca1 OpenMPTarget: Replace kokkos macros in desul.
538d18d OpenMPTarget: update fixme comment.
e832781 Remove extended_namespace template paramter for SYCLMemoryOrder/Scope
c23cfb8 Update Makefile.kokkos
d1ecf9a OpenMPTarget: Add a fixme.
bbd9a78 OpenMPTarget: Changes for OpenMPTarget backend with nvhpc compiler.
ab6f756 Implement `HPX::in_parallel` (#6143)
e88537f Allow linking against build tree (#6078)
b3f9f78 sorting: add to binsort support for strided views and reorg tests (#6081)
2a5c949 Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (#6157)
2a382b4 Merge pull request #6126 from masterleinad/fix_uninitialized_value_in_combined_reducer
461310d Merge pull request #6156 from masterleinad/fix_cuda_lambda_trilinos
12e9645 KokkosTools: Don't call callbacks before backends are initialized (#6114)
f8a2a80 `BinSort`, `BinOp1D`, `BinOp3D`: mark default constructor as deleted (#6131)
d92158c Fix bogus warnings in nested CUDA parallel_reduce
31a5f21 Merge pull request #6136 from masterleinad/fix_nd_builtin_reductions_with_loc
5d81422 Merge pull request #6155 from dalg24/fixup_dual_view
85b014b Fix Kokkos_ENABLE_CUDA_LAMBDA for Trilinos
131503d Revert to `DualView<class,class=void,class=void,class=void>` when deprecated code 4 is enabled
382f0be Merge pull request #6150 from dalg24/drop_profiling_load_print_option
b2645f8 OpenMPTarget: Enable Cray compiler for the OpenMPTarget backend. (#5889)
6c0adb5 Merge pull request #6149 from dalg24/fixup_cuda_lambda
d74df9b [ci skip] Add nightly ci for spack (#6135)
8ede4a4 Merge pull request #6142 from dalg24/cleanup_exported_kokkos_options
d92988f Suppress bogus warning about CUDA_LAMBDA being ON
57226c9 Drop Kokkos_ENABLE_PROFILING_LOAD_PRINT option
87c7be9 Merge pull request #6047 from masterleinad/simplify_sycl_reductions
3f565bb Export Kokkos_ENABLE_<OPTION> that are relevant
3c0f9a1 Merge pull request #6148 from dalg24/drop_kokkos_enable_launch_compiler
6b18c2a Drop Kokkos_ENABLE_LAUNCH_COMPILER option
c935774 Do not append to Kokkos_OPTIONS variables those in the do not export list
2bcfa51 Expand list of kokkos options not to export with cmake
8f4fb72 Merge pull request #6137 from masterleinad/fix_sycl_bit_cast
3329989 Merge pull request #6123 from e10harvey/floating_point_wrapper
ee43d2a Add guards for Cuda
c67ddea Try running for other execution spaces
bf9c242 Allow deprecated declarations in SYCL+Cuda CI
e8dba15 Improve indentation of comments
99161e0 Disable tests for OpenMPTarget
cbc7e88 Fix bit_cast for SYCL again
f8ed850 Disable tests failing with NVHPC
4197fa8 Merge pull request #6120 from uliegecsm/kokkos-dual-view-template-types
02fb8d4 core/src: Move floating_point_wrapper to private header
b86d73a sorting an empty view should exit early and not fail (#6130)
1767bfe dual view: update template types (#6085)
df5681d Don't restrict index type in builtin reducers
766f00d Merge pull request #6133 from msimberg/hpx-post-apply-compat
336473d Merge pull request #6132 from msimberg/hpx-version-requirement-1.8.0
d13cc09 Conditionally use hpx::post instead of hpx::apply based on HPX version
12b0c80 Increase minimum required HPX version to 1.8.0
8a541b5 Move half traits to private header and add half/bhalf infinity trait (#6055)
3f602b6 Merge pull request #6129 from masterleinad/remove_unused_attach_texture_object
6422681 Merge pull request #6121 from masterleinad/use_sycl_bit_cast
0018848 Cuda: Remove unused attach_texture_object
e94b5dd Kokkos_BitManipulation: KOKKOS_COMPILER_GCC->KOKKOS_COMPILER_GNU (#6119)
7009a28 Merge pull request #6122 from masterleinad/ambiguous_bit_cast
6b2459c Fix nightlies -- workaround compiler bug in GCC 9.1 and 9.2 (#6118)
5f45c30 Qualify calls possibly ambiguous calls to bit_cast
1bc1a51 Import sycl::bit_cast into the Kokkos namespace
c62a42e Allow templated functors in parallel_for, parallel_reduce and parallel_scan (#5976)
fb0c1b8 Merge pull request #6106 from crtrott/fix-nvhpc-compilation
f15b5ab Merge pull request #6116 from rbberger/hpcbind_slurm_bugfix
a85923d Merge pull request #6110 from dalg24/fixup_cuda_lambda
531b01d Fix macro guards in test for NVC++ as the CUDA compiler
aa7ab5f hpcbind: check for correct Slurm variable
b26ee87 Merge pull request #6113 from fnrizzi/use_assert_eq_for_std_algo_tests
6ede773 Merge pull request #6064 from masterleinad/sycl_improve_parallel_scan_new
41d9d06 Reintroduce test skip for nvhpc < 23.3
ce0b78f Merge pull request #6111 from dalg24/drop_unused_cmake_macros
81ce338 use ASSERT_EQ in all std algorithms tests
ef5d447 Fixup cmake style
b82161b Drop unused cmake macros
417a6ee Work around NVHPC 23.x not dealing with __isGlobal
0954a1b Drop CUDA_LAMBDA guards in Cuda headers
cfbaf28 Reorganize ZeroMemset (#6087)
798efc5 Always pass -extended-lambda option to NVCC and force Kokkos_ENABLE_CUDA_LAMBDA ON
1c0e3bf Update the OpenACC parallel_reduce() constructs with Range/MDRange/Team (#6072)
cf82edc Merge pull request #6108 from dalg24/drop_algorithms_and_containers_config_files
d7c06c4 Revert "Merge pull request #5964 from PhilMiller/cuda-lambda-default"
7ef7d02 Drop pointless Kokkos{Algorithms,Containers}_config.h files
5fa72b5 Kokkos: Remove TriBITS Kokkos subpackages (trilinos/Trilinos#11545) (#6104)
60b982a Work around NVHPC 23.x issues
ea134de Work around NVHPC issue with enum types
edf63b3 Merge pull request #6101 from dalg24/bit_cast
e247508 Added multiple reducers support for team-level parallel reduce (#5727)
8dc8f49 Fix typo and remove accidentally committed assertions
26ae798 change impl of `is_sorted_until` to use reduce (#6097)
7533cb4 Disable tests that fail at runtime with NVHPC (likely not liking the class declaration within the body of the functor)
d6944df Merge pull request #6008 from uliegecsm/cuda-uvm-space-instance-fence
5c2d948 view(uvm): fence if need in allocation (#6005)
432988b Clang-format glitch
eff2716 Use Kokkos::bit_cast in SIMD instead of rolling its own
e8a44e5 Add runtime tests for bit_cast
ddf55c1 Add the Experimental:: builtin variant (just defer to regular bit_cast)
71ee48f Add compile time tests for the constraints on the bit_cast function template
ab41ef8 Add implementation of bit_cast in <Kokkos_BitManipulation.hpp>
945281a Merge pull request #5964 from PhilMiller/cuda-lambda-default
a45cc1e fix ternary op in subset of std algorithms not working with nvhpc (#6095)
7a166d2 Enable OpenMP in CUDA-11.0-NVCC-RDC to test DEPRECATED_CODE_3=ON (#5978)
4b6d971 OpenMPTarget: Update hierarchical parallelism. (#6043)
d251954 Work around nvcc issue for view_mapping and add FIXME_NVCC comment
5b1f341 Merge pull request #6098 from ndellingwood/update-changelog-4.0.01
e8067d4 [ci skip] Fixup changelog
c28472a Update changelog
4407f7b Remove various test exclusions based on KOKKOS_ENABLE_CUDA_LAMBDA
7e32999 Always expect KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA to be set
51d7c72 Don't fail to define broader 'lambdas are available' macro
4470284 Fix definitions and docs to remove CUDA Lambda option
ddded0e Implement CMake messages per team decision
ca9fd21 Change Makefile.kokkos too
a906356 Tentative arguments switch for nvcc 12+
4846d47 Unconditionally enable CUDA extended lambda support
62d2b6c Merge pull request #6080 from ndellingwood/master
e5490e1 Add support for Darwin 32-bit and PPC (#5916)
56ef02c Disable failed bit manipulation tests when compiled by NVHPC  (#6088)
bdaa12c Compiling with auto deduction of workgroup sizes
3cc9915 Improve SYCL parallel_scan
d30b04d Merge pull request #6065 from masterleinad/fix_join_value_wrapper_for_neutral_element
de5c017 Update OpenACC FunctorAdapter (#6077)
55bbd9f Converted a shared_ptr to a host view in UnorderedMap (#6073)
7793406 Merge pull request #6086 from masterleinad/fix_sycl_execution_space
d5fa56e Fix up SYCL execution space instance creation for Intel GPUs
0ab1f11 Update master_history.txt
5893754 Update version to 4.0.1
15776f9 Merge pull request #6046 from ajpowelsnl/CHANGELOG-4.0.0/team_thread_sort
b3bb4a6 Update changelog (#6058)
24c62bf Merge pull request #6074 from masterleinad/fix_sycl_cuda
220495f Merge pull request #5906 from masterleinad/define_kokkos_compiler_intel_llvm
4b27b7d Fix Kokkos_SIMD with AVX2 on 64-bit architectures (#6075)
b72984a Merge branch 'release-candidate-4.0.01' for 4.0.01
2e51c67 Explicitly cast to CombinedFunctorReducerType
a92e091 Only pass one wrapper object in SYCL reductions
c083089 Merge pull request #6057 from cz4rs/changelog-4.0.01
06dbc15 Fix PerfTests by limiting GramSchmidt
9444634 perf_test is still not working
0c681ed SYCL: Use in-order queue for SYCL+Cuda
c09dd1c Merge pull request #6059 from Rombur/fix_ci_host
b21b1e4 Merge pull request #6068 from crtrott/fix-makefile-4.0.01
2ac576a Update changelog
6d2e899 Fix typo in Makefile.kokkos
57413bb Merge pull request #6063 from stanmoore1/makefile_typo
db802ac Fix join for ValueWrapperForNoNeutralElement
5680563 Fix bug in Makefile.kokkos
4feae9e Reduce size of ScatterView test when using OpenMP
9004274 Merge pull request #6056 from masterleinad/partially_reverse_5504
3eaf13e fix based on comments
318d84c Update changelog to 4.0.01 [ci skip]
079268c Partially reverse #5504
0d96f88 OpenMPTarget: Changes to Makefile.kokkos (#6053)
7645d6c Merge pull request #6052 from masterleinad/fix_unordered_map_shared_space
d26f88c Don't create a shared state for size() in UnorderedMap's deep_copy
3b1afb5 Remove libnuma (#6048)
bb845f2 Merge pull request #6016 from masterleinad/use_wextra
72687d8 Merge pull request #6049 from dalg24/build_md
d0f5777 Remove (outdated) license information [ci skip]
83873a6 Remove Kokkos Keyword Listing section from BUILD.md and refer to the wiki instead
bb7ae99 CHANGELOG.md: add threads sort
48b34de Desul atomics: let relocatable device code mode be part of the configuration (#5991)
0702062 Merge pull request #5504 from masterleinad/sycl_remove_enqueue_barrier_memcpy_workaround
8352a11 Merge pull request #5855 from dalg24/num_threads_and_device_id
b24dcb4 Merge pull request #5990 from jczhang07/patch-1
3f6a854 Merge pull request #6041 from ldh4/remove_unused_thread_vector_range_ctors
d80f580 Define KOKKOS_COMPILER_INTEL_LLVM
140cbd7 Define at most one KOKKOS_COMPILER* macro
b6d1dba Merge pull request #6038 from masterleinad/pgi_compiler
38c6476 Merge pull request #6036 from masterleinad/cherry_pick_trilinos
59124ce Merge pull request #6037 from masterleinad/cherry_pick_6036
0126dcb Remove unused constructors for ThreadVectorRangeBoundairesStruct that are not taking in TeamMemberType as an argument.
29826df Try removing _kokkos_pgi_compiler_bug_workaround
39c35a8 KOKKOS_COMPILER_PGI -> KOKKOS_COMPILER_NVHPC
c9a9ee0 Cherry-pick TriBITS update from Trilinos
7aabd2d Cherry-pick TriBITS update from Trilinos
b57c17b Add -Wextra
9b644e0 Fix OMPT size compare warnings
be65fe4 Fix enum warnings
715a6ff Merge pull request #6030 from masterleinad/fix_missing_field_initializers
0ce3895 Fix -Wmissing-field-initializers warning
5e57438 Relax scratch space limits for HIP reductions (#6029)
ef1ea93 Add -Wdeprecated-copy warning and fix OMPT scan bug related to assignment operators (#6026)
9b06259 #6027: replace remaining instances of ALL_t with Kokkos::ALL_t (#6028)
8b5881f Merge pull request #6022 from crtrott/4001-cp-support-ada
e86c8ea Merge pull request #6021 from dalg24/rc_4_0_01_support_for_amd_gpu_gfx1100
fdb089b Add UnorderedMapInsertOps for coo2crs (#5877)
0476985 Add half_t and bhalf_t limits (#5778)
e6b8548 Merge pull request #6018 from dalg24/rc_4_0_01_bug_desul_atomics_numeric_limits_max
82b3905 Merge pull request #6023 from dalg24/rc_4_0_01_fix_changelog
eb93bbd Merge pull request #6019 from dalg24/rc_4_0_01_warning_hip_std_memcpy
a556f49 Merge pull request #6020 from crtrott/4001-cp-nvcc12-cpp20
ffa4f03 Fixup 4.0 change log (#6015) [ci skip]
89bdbaa Fixup 4.0 change log (#6015)
f36c9ae Add KOKKOS_ARCH_ADA89 to print_configuration
3639121 Do not define KOKKOS_ARCH_AMPERE with Ada (compute capability 8.9)
991901b add support to compile Kokkos for Ada generation (sm_89) consumer GPUs (RTX40x0)
6050076 Merge pull request #5986 from masterleinad/cherry_pick_5981
e275a77 Add support for AMDGPU target NAVI31 / RX 7900 XT(X): gfx1100
8207b2e Allow c++20 in nvcc_wrapper for nvcc 12 and above
eec1a53 Allow that C++20 is passed to nvcc
1b28263 Merge pull request #6000 from Rombur/fix_memcpy
3cbd2ec Desul atomics: fix bug in `desul::Impl::numeric_limits_max<uint64_t>` value
981d9c3 Merge pull request #6017 from masterleinad/fix_sycl_device_copyable
6f16f41 Fix namespace for is_device_copyable
8270db3 Merge pull request #6003 from masterleinad/fix_team_scratch_1_queues_sycl_cuda
54da8a2 Merge pull request #6000 from Rombur/fix_memcpy
8c3d97e Merge pull request #6013 from masterleinad/cherry_pick_6012
9b6a80f Merge pull request #6012 from aprokop/fix_version
7c7ae9a desul: Move lock_array_copied from global scope (#5999)
a7a2d71 SYCL: Make is_device_copyable future-proof (#6009)
4e0d9c7 CMake: update package compatibility mode when building within Trilinos
33b905b CMake: update package compatibility mode when building within Trilinos
79b824e Merge pull request #6010 from masterleinad/fix_sycl_decorated_local_pointers
904fb32 Fix warning in some user code when using std::memcpy
dc876ea Merge pull request #6011 from ldh4/release-candidate-4.0.01
69bd7bd Merge pull request #5995 from masterleinad/cleanup_ompy
bd69243 Merge pull request #5996 from dalg24/desul_atomics_nvcc_warning
86b70c1 Merge pull request #6001 from dalg24/desul_atomics_warning_numeric_limits_max
a6f27bf Pass local_accessor directly instead
8400cbf simd: Fixed an incorrectly returning size for uint64_t in avx2 (#6004)
b0cc5a0 simd: Fixed an incorrectly returning size for uint64_t in avx2 (#6004)
3fc7789 Merge pull request #5948 from dalg24/kokkos_arch_nvidia_gpu_macro
b097f74 Drive-by fix typos "fix {to -> too} many"
f011970 Move Cuda/Kokkos_Cuda_NvidiaGpuArchitectures.hpp -> impl/Kokkos_NvidiaGpuArchitectures.hpp
a798ac7 Explain acquire_team_scratch_space
c5d2c3d m_team_scratch_pool -> m_team_scratch_event
33a5d60 Fix team_scratch_1_queues for SYCL+Cuda
19a43a6 Fix warning with NVC++
106a4a3 Fixup NVIDIA GPU arch must be defined potentially for other backends as well
762e3ce Desul atomics: Fix NVCC warning integer conversion resulted in a change of sign
48640d7 Fix compiling OpenMPTarget for AMD GPUs
d5244e1 Cleanup OpenMPTaget ParallelReduce
65aa95e Merge pull request #5965 from dalg24/desul_numeric_limits_max
4fde4b0 Support --compiler-options in nvcc_wrapper
be14872 Remove workaround for submit_barrier not being enqueued properly
9480cb5 Merge pull request #5962 from masterleinad/host_iterate_tile_combined_functor_reducer
70f6d34 Fix sycl.large_team_scratch_size
b04b46a Merge pull request #5984 from uliegecsm/kokkos-graph-hip
fc3f7fc Merge pull request #5892 from aprokop/use_std_sort_within_a_bin
65bf47c Merge pull request #5983 from masterleinad/fix_unordered_map_m_size
bb5ef8f graph(hip): enable test
0eeb3a4 Merge pull request #5971 from masterleinad/fix_reducer_check_serial_hpx
3c629be Merge pull request #5774 from tcclevenger/refactor_scan_policy_tests
3cb200c Add another test case
6a8e923 Use (non-mutable) std::shared_ptr instead
74e2fe9 UnorderedMap: Ensure size() working in case of copies
260886d Merge pull request #5981 from masterleinad/fix_sycl_large_team_scratch_size
42991f1 Bit manipulation: implement `byteswap` (#5967)
22cc433 Add to HIP tests in Makefile
bb8a96b Fix sycl.large_team_scratch_size
ee75763 #5641: Fix HIP & CUDA MDRange reduce for sizeof(value_type) < sizeof(int) (#5745)
9786d57 Merge pull request #5977 from j8asic/patch-1
43b0245 Print Kokkos version at configuration time (#5979)
067f74a Allow c++20 in nvcc_wrapper for nvcc 12 and above
b000df5 Allow that C++20 is passed to nvcc
82bd4e6 Merge pull request #5963 from masterleinad/fix_partition_master_test
05f644a Merge pull request #5966 from dalg24/cuda_bhalf_conversions_ampere_plus
9f5f762 Merge pull request #5973 from cz4rs/benchmark-add-git-info
63966c1 Merge pull request #5970 from mhalk/feature/add_support_gfx1100
00a24a4 Merge pull request #5972 from aprokop/rename_scoped_profile
3707be7 Merge pull request #5954 from masterleinad/pass_functor_analysis_to_parallel_reduce_ompt
9fe93d4 Merge pull request #5867 from akohlmey/add_cuda_ada_support
b10f35e Improve macro name KOKKOS_IMPL_{ARCH_NVIDIA_GPU_AMPERE_PLUS -> NVIDIA_GPU_ARCH_SUPPORT_BHALF}
b4de0ac Rename KOKKOS_{ -> IMPL_}ARCH_NVIDIA_GPU
72d39a7 Rename ScopedProfileRegion -> ScopedRegion
9798993 [ci skip] Add a comment
488ff10 Bring back git info to benchmarks output
651ba78 Merge pull request #5968 from kokkos/PhilMiller-patch-1
85ab1bc Add support for AMDGPU target NAVI31 / RX 7900 XT(X): gfx1100
42abe36 Convert OpenMPTarget ParallelScan
6e29e92 Convert OpenMPTarget ParallelReduce
f670cae Let KOKKOS_ARCH_NVIDIA_GPU provide the Compute Capability
a7ac045 Drop native from performance benchmark build
e0eacdd Drop native from macOS build
f46889d Drop native from HPX builds
0e302f6 Drop Kokkos_ARCH_NATIVE=ON because it breaks with ccache
1d26ca8 Make CUDA bhalf conversion code more forward compatible
4f18b19 Desul atomics: fix bug max uint64_t value
0b2a956 Merge pull request #5959 from aprokop/scope_guard
2b035de Use CombinedReducer in HostIterateTile
2e667d8 Fix partition_master test
9b18550 Address review comments
0f7b7eb Merge pull request #5953 from masterleinad/pass_functor_analysis_to_parallel_reduce_sycl
787f940 Merge pull request #5894 from masterleinad/pass_functor_analysis_to_parallel_reduce_threads
7fa5a75 Merge pull request #5910 from masterleinad/fix_scan_serial_cuda
d7896e6 Add ParallelScanRangePolicy test
bab74b0 Merge pull request #5947 from dalg24/desul_hip_rdc
543e971 Merge pull request #5958 from dalg24/fixup_openmptarget_concurrency
62fa442 Add [[nodiscard]] qualifiers
73de258 Add ScopedProfileRegion
fb0b94c Fix OpenMPTarget::concurrency()
3c77f6f Also convert SYCL ParallelScan
90836d2 Convert SYCL ParallelReduce
a75aa23 Merge pull request #5949 from masterleinad/pass_functor_analysis_to_parallel_reduce_openacc
b2ec19d Merge pull request #5952 from dalg24/unused_work_range
51fbd42 Drop unused ParallelX::WorkRange member types
5b1a0e3 Merge pull request #5950 from dalg24/4.0-changelog
952b841 Fix Kokkos_Threads_Parallel_MDRange.hpp
6d24bc0 Update changelog to 4.0.0
4dcb294 Use KOKKOS_ARCH_NVIDIA_GPU macro in SYCL, OpenACC, and OpenMPTarget backends where appropriate
7227127 Convert OpenACC ParallelReduce
f967fa9 Provide another constructor in Test16_ParallelScan
5d3bcb1 Define KOKKOS_ARCH_NVIDIA_GPU macro when targeting an NVIDIA GPU architecture
fc4a9ce Merge pull request #5942 from dalg24/print_config_disabled_atomics
65a6f9a Add comments testing for non-device-callable destructors
7b598eb Fix reducer result check for Threads ParallelReduce
9a33347 Use local "reducer" variable
1bfd0cc Convert Threads ParallelReduce implementations
79f8144 Fix reducer result check for Serial+HPX ParallelReduce
659baf6 Drop DESUL_HIP_RDC compile definition
554032e Desul atomics: prefer __CLANG_RDC__ macro
7e4665d Merge pull request #5944 from dalg24/drop_kokkos_enable_rfo_prefetch_macro
ee2ddae Drop KOKKOS_ENABLE_RFO_PREFETCH macro
40c40a7 Convert OpenMP ParallelReduce (#5893)
5c5ac72 Tell when Kokkos atomics are disabled in print_configuration
d9fc6cb Merge pull request #5940 from dalg24/drop_kokkos_enable_atomics_macros
4bf2c5c RangePolicyRequire was not using require
e98766b Merge pull request #5936 from dalg24/drop_kokkos_arch_turing_macro
e69b796 Remove mention of the KOKKOS_ENABLE_*_ATOMICS macros in <Kokkos_Macros.hpp> header
6d10edc Drop KOKKOS_ENABLE_CUDA_ASM* macros
aafe20c Drop `KOKKOS_ENABLE_*_ATOMICS` macros when printing configuration
08dc180 Merge pull request #5923 from dalg24/drop_kokkos_memory_order
d303e40 Merge pull request #5935 from PhilMiller/intel-macro-cleanup
1528cd4 Merge pull request #5932 from shaomeng/improve_vector
32868fa Merge pull request #5931 from tcclevenger/cleanup_unit_test_cmake
537f62e Do not define KOKKOS_ARCH_TURING macro with generated GNU makefiles
6dd4800 Add KOKKOS_ARCH_ADA89 to print_configuration
0e99902 Do not define KOKKOS_ARCH_AMPERE with Ada (compute capability 8.9)
61620e8 Revert "Revert "Fix intel hang""
f419b73 Add missing <atomic> header include
b132b9b add cbegin() and cend() to Kokkos::Vector
2bbe1df Cleanup unit_test/CMakeLists.txt
5f8d0e3 Update clang-format CI build (#5930)
8b80bd0 Merge pull request #5925 from dalg24/kokkos_hip_architectures
310812b Remove extra double quote in CUDA and HIP allocation error messages (#5926)
b9e423e Export Kokkos_HIP_ARCHITECTURES variable with CMake
569a609 Export `Kokkos_CUDA_ARCHITECTURES` variable with CMake (#5919)
1abf653 Drop Kokkos memory oder classes
3bcf389 Use directly memory order from desul in Impl:: atomic funtion templates
2a7629d Prefer non Impl:: atomic_{load,store} in AtomicDataElement since using relaxed memory order
416d7b7 New OpenACC backend implementation for  parallel_scan with  a range policy (#5876)
1cf8907 Use std::sort for sorting within a bin when possible
db890c9 Add test case
f93e48a Don't call the functor's destructor on the device for Serial and Cuda
d177f61 Merge pull request #5918 from PhilMiller/intel-macro-cleanup
12708a1 Use insertion sort for sort within a bin in BinSort (#5890)
90286ca Merge pull request #5911 from masterleinad/pass_functor_analysis_to_parallel_reduce_hip
33e5ef6 Revert "Fix intel hang"
54e4396 containers: Remove workaround for Intel older than the required 19.0.5 and GCC < 5
6a3b1d6 algorithms: Remove workaround for Intel older than the required 19.0.5
1d08f6f Merge pull request #5915 from dalg24/drop_host_lock_arrays
4f871be Convert HIP ParallelScan
70a0af5 Convert HIP ParallelReduce
cba99e8 Remove misplaced and commented host lock array code in OpenMPTarget backend
63879db Drop host lock array
b4655f9 Drop (unused) HBW lock array
c5fe10e Merge pull request #5817 from dalg24/drop_kokkos_lock_arrays
3c06ffe Merge pull request #5907 from dalg24/bit_rotate
fcdedf7 Do not bother with sycl::rotate
771c956 Merge pull request #5895 from masterleinad/pass_functor_analysis_to_parallel_reduce_hpx
bc1138f Merge pull request #5884 from rbberger/amd_rocm_hpcbind
a2181fc Merge pull request #5901 from etiennemlb/fix/cmake-deduplication-issue
0691619 Convert HPX ParallelReduce
ba19572 Use CombinedFunctorReducerType in ParallelReduce (#5874)
22ee14e Implement `rot{l,r}` function templates
4ec9fb6 Add AMD ROCm support to hpcbind
0f51821 Merge pull request #5905 from crtrott/fix_msvc_cuda
8921317 Silence unused parameter warning
66e1437 Apply clang-format
c74aa41 Split math function test further, to work around compilation issue with MSVC/CUDA
43ec33e Work around a bug in MSVC/CUDA in a function.
4075009 Work around a failing CTAD occurance on MSVC/CUDA
47844ce Fix more rank style changes in MSVC/CUDA build
5b9f300 Fix another error with MSVC where we need to use rank()
e53f224 Cleanup prefer {traits:: -> }rank[_dynamic]
fb3d754 Merge pull request #4577 from dalg24/bit_manip
e146fc9 Merge pull request #5870 from dalg24/view_rank_member_function
75a3e80 Disable uchar test to work around broken sycl::ctz on NVIDIA GPUs
b166d77 Add `Experimental::*_builtin` counterpart to the bit manipulation template functions
c256a98 Merge pull request #5881 from msimberg/update-hpx-print-configuration
dff272f Fix CMake deduplication issue when linking with hip::device
c4a5ad0 Update HPX::print_configuration
b3a8182 Backport function templates from <bit> standard library header
03aae9a Merge branch 'develop' into view_rank_member_function
3be7ae2 Add compile-only test for View::rank[_dynamic]
948c6c6 Merge pull request #5620 from cz4rs/core-perf-tests-benchmark-conversion
af89aa7 Merge pull request #5878 from masterleinad/aligned_subview
25ff05b Fix warning pointless comparison of unsigned integer with zero
3d2dc6a Merge pull request #5887 from msimberg/nvhpc-version-macro-more-digits
2969679 Fix MSVC CI build
d3eac2b Cleanup prefer {traits:: -> }rank[_dynamic]
60ba1e1 Add one more digit for KOKKOS_COMPILER_NVHPC version components
4ca0340 Add comment in test
c4b81ec Try fixing Cuda 11 CI
86bbae3 Deprecate subview overload taking a template argument for MemoryTraits
0b94343 MemoryTraits::value -> MemoryTraits::impl_value
6e36acf Add comment in test
c43e45e Remove Aligned memory trait when creating subviews
4286774 Fix warning comparison of integers of different signs
a7daa59 Fix printing extents and rank in error message when copying views
10fae1f Fixup update Kokkos::rank(View) free function and drop outdated comment
314b966 Add View::rank[_dynamic] static constexpr data members
2e53f1c Add Impl::integral_constant
15989dd Merge pull request #5882 from dalg24/deprecate_view_rank_uppercase_r
e348b69 Deprecate View::Rank
05416c9 View::{R -> r}ank in perf tests
8487a96 View::{R -> r}ank in unit tests
2840e8d View::{R -> r}ank in algorithms and containers
9fb2bbc Prefer View::{R -> r}ank
2b532d1 Fix cache configuration in CI (#5871)
d39885a Merge pull request #5873 from masterleinad/fix_version_macro_develop
b6cdada Also test the KOKKOS_VERSION_{LESS,GREATER,EQUAL}
3175011 Add compile-only test to make sure version macros are defined
1d228fa Fix version macros
2caf641 add support to compile Kokkos for Ada generation (sm_89) consumer GPUs (RTX40x0)
2272d3b Merge pull request #5865 from msimberg/hpx-concurrency-non-static-member-function
b6c49a9 Make HPX::concurrency() a non-static member function
b9d405a Fix unused function warning (SYCL)
d25b94b Remove unused variable
204b085 Remove obsolete warning pragmas
b4bd01d Use double quotes instead of <angled> include
eb18f1d Port Atomic tests
6ab2791 Clean up perf_test CMakeLists
1b9a67f Port Mempool performance test
aa20b2b Avoid multiple `main()` definitions
ab55654 Disable unsupported benchmarks in OpenMPTarget
e3324b3 Port ExecSpacePartitionig tests
a45165d Merge pull request #5861 from msimberg/hpx-header-to-subdir
4da9dd9 Move Kokkos_HPX.hpp header into HPX subdirectory
cd8e67f Merge pull request #5857 from dalg24/rm_unsused_files
cd107dd Merge pull request #5856 from dalg24/destruct_delete
36bc91e Port GramSchmidt tests
62b8421 Remove duplicated helper
90b71cb Use correct license headers
b6c619a Add missing tests to Atomic minmax benchmark
e250ce3 Move command line helpers implementation into a header
7dd33f8 Remove ported benchmarks from Makefile
e0b5846 Measure only allocation time
5534a8f Remove redundant include
076d931 Use named constants
924600b Reduce repetition in ViewFill benchmarks
b1a3135 Reduce repetition in ViewResize benchmarks
25876cf Port Custom Reduction tests
5635e13 Use common helper for reporting results
372d03e Fix units - Fill
4b8e0e1 Port Atomic MinMax tests
063fe9a Port HexGrad tests
7c9f640 Port ViewAllocate tests
66e53a9 Remove redundant include
9126797 Clean-up Benchmark_Context and hide implementation details
1b2d07a Port ViewResize tests
5235c89 Port ViewFill performance tests
bbde3b1 Remove pointless dummy source file in core
3448260 Remove unused impl/CMakeLists.txt
8b19e2d Drop (unused) Impl::destruct_delete utility
d8d9c58 Check Kokkos::num_threads and device_id in tests
a4af6f7 Add Kokkos::num_threads() and Kokkos::device_id()
2aa2576 Dispatch Kokkos::sort(Kokkos::View) to SYCL oneDPL (#5229)
6f12ca2 Merge pull request #5852 from rgayatri23/OpenMPTarget_intel_pvc_edits
e7aeb9b Merge pull request #5816 from dalg24/tpetra_atomics_max_abs
fa54c97 Merge pull request #5850 from crtrott/no-deprecated-3-in-makefile
446532e Update core/unit_test/TestNumericTraits.hpp
86a4427 Drop (deprecated) KokkosCore_UnitTest_DefaultDeviceTypeInit_* from the makefile
cfb7b2f Merge pull request #5854 from dalg24/house_keeping
14f9425 OpenMPTarget: Replace KOKKOS_ARCH_INTEL with KOKKOS_COMPILER_INTEL to protect declare target on Intel GPUs.
34a21cb Merge pull request #5847 from dalg24/fixup_omp_thread_pool_size
387de48 Move { -> Threads/}Kokkos_Threads.hpp
be83e9a Move { -> Serial/}Kokkos_Serial.hpp
7436256 Move { -> Cuda/}Kokkos_Cuda[Space].hpp
1d8dd90 OpenMPTarget: Enable declare target for all Intel GPUs.
8b2bf33 Merge pull request #5849 from dalg24/hpx_asyn_dispatch_warning
f2ec98d Fix clang+cuda compiler warning about cudaDeviceSynchronize (#5846)
4c878a0 OpenMPTarget: Adding declare target for constexpr variables.
568bc2c Don't enable deprecated code 3 in Makefile builds anymore
c005e60 Pass *this to in_parallel in OpenMP::impl_thread_pool_size()
f68098b Fix CMake warning when HPX is not enabled
cba11a1 Merge pull request #5841 from dalg24/desul_atomics_source_files
5bb7e0a Fixup deprecated code 3 code path OpenMP::impl_thread_pool_size
5ea96bc Update HPX backend to use HPX's sender/receiver functionality (#5628)
97ad51b Fix unused parameter warning in SYCL lock array and add comment
879d607 Make OpenMP::concurrency and impl_thread_pool_size non-static (#5836)
46185fe Merge pull request #5840 from dalg24/nvhpc_arch_native
153aa59 Merge pull request #5838 from dalg24/typo_deprecared
41166e1 Merge pull request #5833 from masterleinad/sycl_device_global_static_only
43ccea6 Desul atomics: Drop `DESUL_HAVE_{GPU_LIKE,FORWARD}_PROGRESS` macros
1d19328 Desul atomics: SYCL lock arrays out of sync
37bcd41 Desul atomics: cleanup macro guards in CUDA/HIP lock guard files
23e2d85 Desul atomics: conditionally append the CUDA/HIP/SYCL source files
93487cf Fix flag passed to NVHPC when `Kokkos_ARCH_NATIVE` is `ON`
ccbfb00 Set native flags according to CMAKE_SYSTEM_PROCESSOR (#5831)
b8603a7 Fixup typo `#ifdef KOKKOS_ENABLE_DEPRECA{R -> T}ED_CODE_3`
c10edf3 Skip Tpetra reproducer with NVHPC compiler
f9f1808 Merge pull request #5834 from masterleinad/fix_unprefixed_macros_kokkos_host_mdpsan
a62aa40 Refactor OpenMPTarget backend (#5726)
f3d9efb Fix unprefixed macros on KokkosExp_Host_IterateTile.hpp
dac21c7 Add non-standard `rsqrt` math function (#5644)
073ce8b Try using oneAPI 2023.0.0 in SYCL+Cuda CI (#5813)
b477f99 Merge pull request #5832 from PhilMiller/fix-crs-define
d41a6df HIP: Drop obsolete macro definition
87535d8 ViewLayoutTiled: Be scrupulous about macro naming and undefining
f4c8f8d OpenMPTarget: Be scrupulous about macro naming and undefining
ae585b7 CUDA: Fix up comment
fbceafd CUDA: Convert simple value macro to constexpr
71e0eca CRS: Use Kokkos device function macros rather than duplicating code when compiling for GPU targets
ba4ebc4 Restrict KOKKOS_IMPL_SYCL_DEVICE_GLOBAL_SUPPORTED feature macro detection to static libraries
52586ef Merge pull request #5825 from dalg24/device_ptr_to_lock_array_in_constant_memory
0130a3f Initial OpenACC parallel_reduce implementation for Team policy (#5610)
59067d4 Use raw literal string to avoid having to escape characters in git commit message (#5823)
333157f Merge pull request #5742 from rgayatri23/OpenMP_regression_fix
8103d82 SIMD backend of ARM NEON (#5775)
fb7d9f2 SYCL: Pass Xsycl-target-backend* only to the linker (#5705)
04e3437 Further update to CUDA occupancy calculation (#5739)
a564953 Desul atomics: let pointer to the device lock arrays (HIP and CUDA) be in constant memory without RDC as well
22380c7 Merge pull request #5819 from dalg24/deprecate_kokkos_active_execution_memory_space_macros
92895ff Merge pull request #5818 from masterleinad/fix_all_t_deprecations
e8381d8 Add TODO comment to replace fully-qualified name when possible
ecd23e4 Spell out Kokkos::ALL_t to avoid deprecation warnings
789dfa7 Merge pull request #5821 from masterleinad/fix_sycl_ci_device_global
9d7257a Fixup turns out Tpetra "abs max" operation does not preserve the sign
2e1a559 Merge pull request #5820 from crtrott/fix-intel-ice-dev
eabd0e4 Disable global device variables in SYCL+Cuda CI
cd8eb9c Remove Cuda and HIP lock arrays altogether
f78d87a Unwire initializing/finalizing Kokkos lock arrays
bd86fe9 Change `#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_{4 -> 3}`
2b5c31a Intel ICE Sacado: turn off support for nested OpenMP with ICPC
6701772 Intel ICE Sacado: use new HostIterateTile API in OpenMP
6688cad Intel ICE Sacado: use new HostIterateTile API in HPX
b98e824 Intel ICE Sacado: use new HostIterateTile API in Threads
80c770d Intel ICE Sacado: use new HostIterateTile API in Serial
6935f70 Intel ICE Sacado: rewrite HostIterateTile
a6a0237 Deprecate `KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_*` macros
60f80e5 Merge pull request #5613 from masterleinad/sycl_extended_atomics
2f07a04 Fix initial value (identity element) for max abs
258bac6 Add unit test capturing Tpetra custom atomics use case
7cccd74 Merge pull request #5707 from masterleinad/sycl_update_2023
d63af25 Merge pull request #5814 from dalg24/scratch_locks
5d93865 Break lock array dependence of Cuda and HIP teams impl
5d87aa9 Merge pull request #5811 from dalg24/rm_desul_atomic_helper
8b80616 Merge pull request #5810 from masterleinad/move_sycl_headers
61d8569 Update Dockerfile used for SYCL+Cuda CI
05d008d Address deprecations in oneAPI 2023.0.0
b5b0504 Update minimal compiler requirements for SYCL
0180ff5 Update architecture flags for SYCL+Cuda
b0be8e6 Disable tests failing with SYCL+Cuda after update to oneAPI 2023.0.0
3369267 Merge pull request #5800 from masterleinad/improve_comment_test_team
7a13414 Merge pull request #5767 from masterleinad/fix_scratch_again
84a336a Merge pull request #5807 from dalg24/all_t
adb3141 Drop desul_* helper functions in tasking
94d9c9e Merge pull request #5804 from dalg24/purge_legacy_atomics
ddefe61 Issue warnings when using Kokkos::Impl::ALL_t
236e892 Fixup GH Actions compiler warnings (#5780)
6d90db3 Move all SYCL headers into SYCL directory
05f6a9a Per review dropped superfluous const-qualifiers
4519e4c Drop anonymous namespace around definitions of ALL, WithoutInitializing, and AllowPadding
e91f7e8 Guard using-declaration in Impl:: namespace with #ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
5304a40 Stay off Kokkos::Impl::ALL_t
7869915 Move Kokkos::{Impl:: -> }::ALL_t definition and add using-declaration in Impl:: namespace for backward compatibility
1668cf4 Merge pull request #5802 from ibaned/avx512-mask-fix
aeab5bd Merge pull request #5805 from dalg24/fixup_rocm54_force_global_launch_launch
d745c31 Fixup deleted wrong branch in HIP locks
796e964 Drop `KOKKOS_ENABLE_IMPL_DESUL_ATOMICS` macro define altogether
7f5ea60 Update diff_files (might be worth revisiting logic)
52953c8 Remove a whole bunch of Kokkos leagacy atomics headers
44140f7 Get rid of #ifdef KOKKOS_ENABLE_IMPL_DESUL_ATOMICS in unit tests
c3fe1d6 Purge macro guards for desul atomics being enabled or not
c54547e Fixup ROCm 5.4 ImplForceGlobalLaunch{Launch -> }_t typo in unit tests
153b4c1 remove const_cast with some code duplication
f253bc4 Print KOKKOS_IMPL_SYCL_DEVICE_GLOBAL_SUPPORTED in print_configuration
4c03c8d KOKKOS_SYCL_DEVICE_GLOBAL_SUPPORTED->KOKKOS_IMPL_SYCL_DEVICE_GLOBAL_SUPPORTED
05cb3f5 Purge logic around desul atomics being enabled at configuration time
cb67caf Warn at configuration time if attempting to disable desul atomics and force using it (#5801)
5212d90 Fix a bug in AVX512 simd_mask::operator[]
aa0f81e Replace HIP_LOCK_ARRAYS macros by functions (#5770)
a75c613 Merge pull request #5796 from Rombur/force_global_launch
53cc297 Improve comments in TestTeam.hpp
e4b3c82 SYCL: Add support for arbitrary size atomics
5b3b6e7 Rename ImplForceGlobal to ImplForceGlobalLaunch
de37fc2 Merge pull request #5784 from masterleinad/drop_KOKKOS_IMPL_WORKAROUND_INTEL_LLVM_DEFAULT_FLOATING_POINT_MODEL
d50bdd0 Merge pull request #5797 from cz4rs/container-options
cf6d43d Merge pull request #5786 from dalg24/cleanup_rm_eliminate_warning_for_lock_array
478f087 Fix typo
b99fb31 Use GTEST_SKIP to skip test
c9929fc Merge pull request #5795 from dalg24/reduction_identity_char
0f8b7ca Skip test and add comment explaining why
2902035 Fix tests when using ROCm 5.3
d7aa278 Remove obsolete container configuration
13c4de2 Merge pull request #5793 from dalg24/fixup_jenkins_gnu_generated_makefile
cf67ab4 Force GlobalMemory launch for some Bessel tests when using ROCm 5.4
f9d9505 Add parameter to force using GlobaLMemory launch mechanism using HIP
2e6c238 Drop KOKKOS_IMPL_WORKAROUND_INTEL_LLVM_DEFAULT_FLOATING_POINT_MODEL
e8c08e2 Fix sycl.scratch_align test
de26b23 Add missing ReductionIdentity<char> specialization
2bde8fc Merge pull request #5792 from masterleinad/improve_assert_macros
24ef794 Fixup warning in Jenkins CI build with GNU generated makefile
d2a73f9 Merge pull request #5791 from dalg24/dead_omp_test_source_file
73b4ca8 Prefer ASSERT_EQ over ASSERT_TRUE with ==
aa7865e Remove unused OpenMPTarget test source file
7475b89 Remove dead OpenMP test source file
c304818 Merge pull request #5755 from Rombur/hip-fix-global-launch
9f09e2b Drop unused Kokkos::Impl::eliminate_warning_for_lock_array CUDA/HIP functions
7f08b95 Desul atomics cleanup remove unused Impl::eliminate_warning_for_lock_array()
6e73a35 Merge pull request #5785 from masterleinad/replace_sprintf
0e2fda8 Merge pull request #5642 from cz4rs/enable-flang
20b609a sprintf -> snprintf
b5bd709 Merge pull request #5779 from cz4rs/upgrade-github-actions
4bd3e85 Upgrade GitHub actions
7652228 Use `flang-new` for Fedora builds
48e0874 Merge pull request #5777 from junghans/patch-5
619ed2d Fix build on Fedora rawhise
910d43e OpenMP: Adding an ifdef around chunksize for static schedule for GCC compiler.
728e3d3 Merge pull request #5762 from masterleinad/fix_scratch_space_for_sycl
0db3bd8 Fix a typo
4829fb2 Add a mutex to protect scratchFunctor
8f4f31d Merge pull request #5764 from dalg24/desul_atomics_config
ba0ad25 Merge pull request #5765 from ldh4/hpx_team_reduce_sfinae
7a3bfe0 Fix macro typo used in the OpenACC backend parallel_reduce(MDRange). (#5766)
97287f6 Remove unnecessary header
9f24f55 Merge pull request #5763 from masterleinad/fix_openmp_with_deprecated_code_3
20abee9 Let increment be of type uintptr_t fixing warning
1758196 Generate <desul/atomics/Config.hpp> file from the generated Makefiles
51aa904 Desul atomics configure library based what the user enabled
45acff3 Fix reviewers' comments
a9c997c Fix ScratchSpace pointer comparison for SYCL
aad8792 Merge pull request #5757 from dalg24/desul_atomics_drop_cuda_arch_macro_guards
02941a0 Merge pull request #5760 from dalg24/desul_atomics_gnu_and_msvc
7f883bc Merge pull request #5756 from dalg24/desul_atomics_sycl_macro
e5e8742 Added missing enable_ifs to hpx team parallel_reduce
33d7fce Fix compiling with OpenMP and Kokkos_ENABLE_DEPRECATED_CODE_3
1f68ab4 Desul atomics cleanup enable GCC or MSVC atomics
cd0b631 Encapsulate staging inside scratch_functor
c6d7662 Merge pull request #5759 from dalg24/cmake_package_version_compatibility
49b00de CMake: change package COMPATIBILITY mode {SameMajorVersion -> AnyNewerVersion}
0986a3a Desul atomics: drop unnecessary macro guard that checks for__CUDA_ARCH__ in PTX assembly code
0e3848f Desul atomics: drop unnecessary macro guard that checks for__CUDA_ARCH__ in compare exchange
46aae0f Desul atomics fixup detect use of SYCL
989d996 Merge pull request #5751 from masterleinad/update_kokkos_version_develop
296de12 Return host functor instead of device one
487deee Apply clang-format
cde661d Update Kokkos version on develop
b475233 Merge pull request #5722 from dalg24/openacc_parallel_reduce_mdrange
cf4358e Add more comments
d0d6404 Merge pull request #5747 from dalg24/fixup_omp_makefile
00ab763 Fixup forgot to add new OpenMP source file in Makefile
a84c7a5 Merge pull request #5741 from ndellingwood/update-testallsandia
57504c4 Merge pull request #5698 from masterleinad/static_assert_reducer
761ffda Fix HIP Global Launch with HSA_XNACK=1
dafb577 Merge pull request #5738 from Rombur/refactor_openmp
459e881 Merge pull request #5740 from seyonglee/openacc_cmake_make_bugfix
cf04bb5 [ci skip] update test_all_sandia
74a7988 Minor bug fixes on CMake and Make configurations for the OpenACC backend.
fb47be7 Merge pull request #5730 from tkordenbrock/tkordenbrock/fix-DynamicView-deep_copy-dp-sp
fbfa01e Move OpenMP UniqueToken to its own file
2f7e94a Move OpenMP functions out of Kokkos_OpenMP_Instance.hpp
f92270b Move part of Kokkos_OpenMP_Instance.cpp into Kokkos_OpenMP.cpp
48e8692 Move Kokkos_OpenMP.hpp to OpenMP/Kokkos_OpenMP.hpp
5d136cc Static asserts for reducers
d2e574c Apply clang-format
77d57d2 Merge pull request #5731 from dalg24/cleanup_cuda_blocksize_deduction
3697d45 Merge pull request #5735 from crtrott/remove-kokkos-cxx-standard-from-buildmd-develop
e0ebaa5 Merge pull request #5733 from ndellingwood/fix-intel19-werror
edfb1e3 Fix -Werror with intel/19
6aa7bf6 Remove KOKKOS_CXX_STANDARD mentioning from BUILD.md
67dff62 fix broken DynamicView test case #4
1f4468b fix src/dst Properties in deep_copy(DynamicView,View)
d4bd012 Revert "Drop pre CUDA 11 macro guards in occupancy calculation"
1fd8589 Drop now unsused `get_shmem_per_sm_prefer_l1` function
d34c751 Drop pre CUDA 11 macro guards in occupancy calculation
4954ce2 Merge pull request #5689 from cz4rs/performance-results-visualization
a23580e Temporarily disable unsupported reduction tests in core/unit_test/incremental/Test14_MDRangeReduce.hpp for the OpenACC backend.
7e651ca Group similar options together
ef7fd60 Configure `ccache` for benchmark builds
1134a1f Simplify Kokkos configuration
64d9b44 Use maximum available level of build parallelism
9fd7187 Use correct GitHub access token
d179453 Use correct branch for destination repo
9fbd78a Configure `ccache` correctly
9018621 Initial implementation of MDRange parallel_reduce
c6fae3f Move definitions of `OpenACCIterate{Left,Right}` and `OpenACCMDRange{Begin,End,Tile}`
604dc86 Remove commented out code
327aac5 Add comment for PerformanceTest_* executables
3a1769b Build on pull request
176ae8b Use double quotes instead of <angled> include
67a92d3 Do not build tests and examples
92906bf Remove security options
2e09341 Use separate .yml file for benchmarking
07b01ef Use correct header guards

git-subtree-dir: tpls/kokkos
git-subtree-split: 1a3ea28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug The primary issue is a bug in Trilinos code or tests
Projects
Development

No branches or pull requests

2 participants