Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flat sparse packs #877

Merged
merged 20 commits into from
Jun 1, 2023
Merged

Add flat sparse packs #877

merged 20 commits into from
Jun 1, 2023

Conversation

Yurlungur
Copy link
Collaborator

@Yurlungur Yurlungur commented May 22, 2023

PR Summary

In riot I recently encountered an interesting use-case, which I needed to support. It's actually familiar. In the boundary communication kernels, we fuse the loop over meshblocks, over faces, and over variables. So the outermost index has a range that spans ALL variables over ALL boundary buffers and ALL meshblocks in a given MeshData object. We do this so that when we run with hierarchical parallelism, there's enough "chunks" of work in the outermost index space to saturate the device... i.e., we need at least as many elements in the outer range as we have warps on the GPU.

I found myself needing the same capability for loops over variables in the interior of a block. I.e., I needed a hierarchical structure where the outer loop is over meshblocks and variables TOGETHER and the inner loop is over the 3D index space of the meshblock. Because of sparse, this outer loop needs to be a SINGLE loop. The number of variables on each meshblock might vary.

Here I implement this capability in @lroberts36 's SparsePacks, which was the easiest place. When built with the sparse flag in the generic factory function, or with the GetFlat factory function, the sparse pack builds with this shape.

Note that I added a blurb describing this functionality, but the documentation for the SparsePack is very sparse, and I didn't fill it in.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

@Yurlungur Yurlungur added enhancement New feature or request Kokkos Kokkos infrastructure/feature related performance labels May 22, 2023
@Yurlungur Yurlungur self-assigned this May 22, 2023
@Yurlungur
Copy link
Collaborator Author

P.S., thanks @lroberts36 for your help with this

Copy link
Collaborator

@brryan brryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, for my own understanding, how is this intended to be used downstream with sparse variables? Is it just an if check for each variable on each meshblock and the work is skipped if that variable isnt allocated on that meshblock?

src/interface/sparse_pack.hpp Show resolved Hide resolved
src/interface/sparse_pack.hpp Outdated Show resolved Hide resolved
src/interface/sparse_pack_base.hpp Outdated Show resolved Hide resolved
@Yurlungur
Copy link
Collaborator Author

how is this intended to be used downstream with sparse variables? Is it just an if check for each variable on each meshblock and the work is skipped if that variable isnt allocated on that meshblock?

It only packs allocated variables. So if for example, I have a total of 20 variables, but 3 variables are allocated on each of 5 blocks (where the set of variables allocated are different on each block), you get an outer index of length 15.

@Yurlungur Yurlungur enabled auto-merge May 22, 2023 21:08
@Yurlungur Yurlungur disabled auto-merge May 22, 2023 21:08
@lroberts36
Copy link
Collaborator

the documentation for the SparsePack is very sparse

I see what you did there ;)

@Yurlungur
Copy link
Collaborator Author

@lroberts36 you should probably be the other person to review this---seeing as you wrote sparse packs.

Copy link
Collaborator

@lroberts36 lroberts36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM except that I think we should make sure the pack bounds arrays are meaningful even for flat packs.

@@ -200,27 +252,32 @@ class SparsePack : public SparsePackBase {
KOKKOS_INLINE_FUNCTION int GetLowerBound(const int b) const { return 0; }

KOKKOS_INLINE_FUNCTION int GetUpperBound(const int b) const {
PARTHENON_DEBUG_REQUIRE(!flat_, "Bounds not valid for flat packs");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they not valid if b==0? I thought they were.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after reading below I see they are not valid. I think we should make it so that they are valid, and maybe add a default const int b=0 here so that GetLowerBound() and GetUpperBound() work for sparse packs (or maybe make them separate functions that have a check if flat_==true). I think this is clearer than exposing the pack size through GetMaxNumberOfVars().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to just call them GetFlatLowerBound() and GetFlatUpperBound().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to make them valid and just re-use the API. I'm not sure if we should come up with new or more generic names, I just chose not to for now. We can revisit that later maybe.

src/interface/sparse_pack_base.cpp Show resolved Hide resolved
Comment on lines 136 to 138
if (!desc.flat) {
pack.bounds_h_(0, b, i) = idx;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we still set this with the actual block index, i.e. pack.bounds_h_(0, block, i) = idx? I am not sure what the use case would be, but it would allow all of the Get*Bound stuff to work returning the flat index.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually desired behaviour though? I wonder if GetBounds should return 0, and nvars always in the flat case? I could add an if statement to that effect maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now updated GetLowerBound returns 0 and GetUpperBound returns total numbers of vars and blocks (i.e., pack size) for flat packs.

src/interface/sparse_pack_base.cpp Outdated Show resolved Hide resolved
src/interface/sparse_pack_base.cpp Outdated Show resolved Hide resolved
src/interface/sparse_pack_base.cpp Outdated Show resolved Hide resolved
src/interface/sparse_pack_base.cpp Show resolved Hide resolved
Copy link
Collaborator

@pgrete pgrete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
I don't have a strong opinion with respect to the index handling.
In the spirit of our ParArrayNDs that transparently handle 0 I could see the appeal to also allow using the standard () operator and bounds being used with a PARTHENON_DEBUG_REQUIRE(!flat || b == 0,...).

doc/sphinx/src/interface/sparse.rst Outdated Show resolved Hide resolved
src/interface/sparse_pack.hpp Show resolved Hide resolved
Yurlungur and others added 2 commits May 25, 2023 10:10
@Yurlungur
Copy link
Collaborator Author

After incorporating @lroberts36 's suggestions, I broke a test, so I'm going to need a little time to debug. We should not wait for this PR for the next release. @pgrete .

@Yurlungur
Copy link
Collaborator Author

@lroberts36 this is ready for re-review. I believe I've incorporated all your comments.

@lroberts36 lroberts36 self-requested a review May 30, 2023 17:25
Copy link
Collaborator

@lroberts36 lroberts36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we were on a different page about how to keep the (block, variable) indexing working for flat packs. I included some possible changes that are along the lines of what I was thinking. I think we probably don't want to do it because of the extra H2D copy, but another array could be added that maps from flat idx to block idx (along with functions for querying the pack for which block an flat idx corresponds to) so that e.g. the correct coordinates could be accessed from a flat pack.

Comment on lines 200 to 252
KOKKOS_INLINE_FUNCTION int GetLowerBound(const int b) const { return 0; }
KOKKOS_INLINE_FUNCTION int GetLowerBound(const int b = 0) const { return 0; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should change for flat packs. GetLowerBound(b) with b > 0 should return bounds_(0, b, 0) and GetUpperBound(b) should return bounds_(1, b, nvar_).

Comment on lines 254 to 255
KOKKOS_INLINE_FUNCTION int GetUpperBound(const int b = 0) const {
return bounds_(1, !flat_ * b, !flat_ * nvar_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there should be any check for if the pack is flat here. This is why I think that there actually needs to be a new function that returns the total upper bound

KOKKOS_INLINE_FUNCTION int GetUpperBound() const { 
  assert(flat_); 
  return bounds_(1, GetNBlocks() - 1, nvar_); 
}

KOKKOS_INLINE_FUNCTION int GetUpperBoundHost(const int b) const {
return bounds_h_(1, b, nvar_);
KOKKOS_INLINE_FUNCTION int GetUpperBoundHost(const int b = 0) const {
return bounds_h_(1, !flat_ * b, !flat_ * nvar_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also no flat check here.

@@ -179,13 +187,12 @@ SparsePackBase SparsePackBase::Build(T *pmd, const PackDescriptor &desc) {
}
}
// Record the maximum for easy access
pack.bounds_h_(1, b, nvar) = idx - 1;
pack.bounds_h_(1, !desc.flat * b, !desc.flat * nvar) = idx - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pack.bounds_h_(1, !desc.flat * b, !desc.flat * nvar) = idx - 1;
pack.bounds_h_(1, block, nvar) = idx - 1;

Comment on lines 179 to 181
if (!desc.flat) {
pack.bounds_h_(1, b, i) = idx - 1;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!desc.flat) {
pack.bounds_h_(1, b, i) = idx - 1;
}
pack.bounds_h_(1, b, i) = idx - 1;

Comment on lines +135 to +142
if (!desc.flat) {
idx = 0;
b = block;
// JMM: This line could be unified with the coords_h line below,
// but it would imply unnecessary copies in the case of non-flat
// packs.
coords_h(b) = pmbd->GetBlockPointer()->coords_device;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!desc.flat) {
idx = 0;
b = block;
// JMM: This line could be unified with the coords_h line below,
// but it would imply unnecessary copies in the case of non-flat
// packs.
coords_h(b) = pmbd->GetBlockPointer()->coords_device;
}
if (!desc.flat) {
idx = 0;
b = block;
}
coords_h(block) = pmbd->GetBlockPointer()->coords_device;

Comment on lines 145 to 147
if (!desc.flat) {
pack.bounds_h_(0, b, i) = idx;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!desc.flat) {
pack.bounds_h_(0, b, i) = idx;
}
pack.bounds_h_(0, block, i) = idx;

Comment on lines 168 to 170
if (desc.flat) {
coords_h(idx) = pmbd->GetBlockPointer()->coords_device;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (desc.flat) {
coords_h(idx) = pmbd->GetBlockPointer()->coords_device;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is potentially confusing that for regular backs GetCoords takes the block number as an argument while for flat packs it takes the variable flat index.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be able to make this call, as I think there's a clear use-case: i.e., you need coords for your stencil op. However, I would be willing to give GetCoords two different names?

// Size is nvar + 1 to store the maximum idx for easy access
pack.bounds_ = bounds_t("bounds", 2, nblocks, nvar + 1);
pack.bounds_ = bounds_t("bounds", 2, desc.flat ? 1 : nblocks, !desc.flat * nvar + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pack.bounds_ = bounds_t("bounds", 2, desc.flat ? 1 : nblocks, !desc.flat * nvar + 1);
pack.bounds_ = bounds_t("bounds", 2, nblocks, nvar + 1);

pack.bounds_h_ = Kokkos::create_mirror_view(pack.bounds_);

pack.coords_ = coords_t("coords", nblocks);
pack.coords_ = coords_t("coords", desc.flat ? max_size : nblocks);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pack.coords_ = coords_t("coords", desc.flat ? max_size : nblocks);
pack.coords_ = coords_t("coords", nblocks);

@Yurlungur
Copy link
Collaborator Author

@lroberts36 I don't think I understand what you want. Can we talk over MM or something?

@lroberts36
Copy link
Collaborator

yep, I am on MM

@Yurlungur
Copy link
Collaborator Author

@lroberts36 take a look now. I've re-enabled original functionality for GetLowerBounds and GetUpperBounds as we discussed. Also added GetSize as a transparent way to get total size of the array.

Copy link
Collaborator

@lroberts36 lroberts36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple more comments/suggestions below, but this LGTM and I approve

Comment on lines 168 to 170
if (desc.flat) {
coords_h(idx) = pmbd->GetBlockPointer()->coords_device;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is potentially confusing that for regular backs GetCoords takes the block number as an argument while for flat packs it takes the variable flat index.

Comment on lines 140 to 144
KOKKOS_INLINE_FUNCTION
Real &operator()(int idx, const int k, const int j, const int i) const {
PARTHENON_DEBUG_REQUIRE(flat_, "Accessor only valid for flat packs");
return pack_(0, 0, idx)(k, j, i);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for providing operator() in sparse_pack_base.hpp? I was trying to confine the interface to just SparsePack.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a mistake. I'll remove it.

@Yurlungur
Copy link
Collaborator Author

@pgrete the cuda regression test appears to be failing, without a useful error message, but it works on my machine. I'll dig in a little bit, but I wonder if something is wrong with the CI machine?

@Yurlungur
Copy link
Collaborator Author

Weird problem went away. Pulling the trigger.

@Yurlungur Yurlungur enabled auto-merge May 31, 2023 19:08
@Yurlungur Yurlungur disabled auto-merge May 31, 2023 22:31
@Yurlungur Yurlungur enabled auto-merge May 31, 2023 22:31
@lroberts36 lroberts36 disabled auto-merge June 1, 2023 00:44
@lroberts36 lroberts36 enabled auto-merge (squash) June 1, 2023 00:44
@lroberts36 lroberts36 merged commit 74c1d05 into develop Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Kokkos Kokkos infrastructure/feature related performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants