Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for CUDA codegen #1442

Merged
merged 26 commits into from
Dec 18, 2023
Merged

Fix for CUDA codegen #1442

merged 26 commits into from
Dec 18, 2023

Conversation

edopao
Copy link
Collaborator

@edopao edopao commented Nov 22, 2023

This PR addresses #1388: fix python codegen and SharedToGlobal1D template to generate correct code for write without reduction.

Argument to std::ifloor should be double, otherwise invalid result
on gpu target.
Use new template for dace::SharedToGlobal1D
After uplift to dace v0.15, one SDFG which was working before started
to show compilation errors. The latest DaCe is moving a data access
to an inter-state edge. For the data-access, the symbols that define
array strides are needed for code generation. The SDFG was validated,
before and after the simplify pass, but it did not compile for CPU.
When skipping the simplify pass, the compilation did work.
The problem has been narrowed down to the scalar-to-symbol promotion,
which is moving a data access to an inter-state edge. Then, the
method _used_symbols_internal needs to be update to account for data
containers, including symbolic shape and strides.
This commit contains a unit test to reproduce the issue and verify
the proposed fix.
@edopao edopao linked an issue Nov 22, 2023 that may be closed by this pull request
@edopao edopao marked this pull request as ready for review November 24, 2023 15:07
@edopao edopao added bug Something isn't working codegen labels Nov 30, 2023
@edopao edopao requested a review from tbennun November 30, 2023 15:26
dace/codegen/targets/cuda.py Outdated Show resolved Hide resolved
dace/codegen/targets/cuda.py Show resolved Hide resolved
dace/codegen/targets/cuda.py Outdated Show resolved Hide resolved
dace/runtime/include/dace/math.h Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you the review, I will address your comments in a new commit.

dace/codegen/targets/cuda.py Outdated Show resolved Hide resolved
dace/codegen/targets/cuda.py Show resolved Hide resolved
dace/codegen/targets/cuda.py Outdated Show resolved Hide resolved
dace/runtime/include/dace/math.h Outdated Show resolved Hide resolved
@edopao edopao requested a review from tbennun December 12, 2023 07:16
@edopao
Copy link
Collaborator Author

edopao commented Dec 12, 2023

@tbennun Test added, please re-review.

Copy link
Collaborator

@tbennun tbennun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Only minor comments remain

@@ -1132,10 +1132,22 @@ def _emit_copy(self, state_id, src_node, src_storage, dst_node, dst_storage, dst
func=funcname,
type=dst_node.desc(sdfg).dtype.ctype,
bdims=', '.join(_topy(self._block_dims)),
is_async='true' if state_dfg.out_degree(dst_node) > 0 else 'true',
is_async='true' if state_dfg.out_degree(dst_node) > 0 else 'false',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be the other way around (if there is a dependent read after it in the same state, sync).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I did not pay enough attention to is_async before. Besides correcting the value of this argument, I have also moved the synchronization point in the template function after the thread-level copy (see my last commit on copy.cuh).

dace/codegen/targets/cuda.py Outdated Show resolved Hide resolved
dace/codegen/targets/cuda.py Outdated Show resolved Hide resolved
@edopao
Copy link
Collaborator Author

edopao commented Dec 13, 2023

@tbennun Thank you for the review. As I commented above, I have done one additional change related to is_async flag.

@edopao edopao requested a review from tbennun December 13, 2023 06:45
@edopao
Copy link
Collaborator Author

edopao commented Dec 18, 2023

Gentle reminder for @tbennun: please check my last comment and whether you can approve this PR.

@tbennun tbennun added this pull request to the merge queue Dec 18, 2023
Merged via the queue into spcl:master with commit 7c06755 Dec 18, 2023
11 checks passed
@edopao edopao deleted the bug-gpu-codegen branch December 19, 2023 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working codegen
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA compile error on SharedToGlobal1D
2 participants