-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid transpose in symbolic Cholesky benchmark #1312
Conversation
void symbolic_cholesky(const matrix::Csr<ValueType, IndexType>*, | ||
std::unique_ptr<matrix::Csr<ValueType, IndexType>>&, | ||
std::unique_ptr<elimination_forest<IndexType>>&); | ||
void symbolic_cholesky( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps it makes sense to always just return the pattern of L
? Then you would have to add the symmetrization to the places where you call this, but IMO it would be clearer what the function does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have this in quickly, so I can always benchmark off develop
, clean-up comes later :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't think that is the purpose of develop
. Why not use a branch that has everything you need in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I want things to be reproducible, if I mix performance tuning with benchmark changes, I could easily mess up things.
And in general, all uses of the symbolic factorization in Ginkgo need the symmetric version ATM (maybe with some changes in the future for Cholesky, but that's not yet clear). This is not a public function, so it makes sense to choose the most useful semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe still need to keep one version including transpose time?
@yhmtsai agreed, I added it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @MarcelKoch.
develop should not have a short change and revert afterwards.
At least, if we add the performance test without transpose, we should keep it for a while.
Maybe not complete related to this, but I would like to bring #895
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of renaming an old functionality instead of renaming the new one, as this makes it difficult to re-generate old benchmark results.
Edit: Since the benchmark was added 2 days ago in #1302, I am fine with renaming this functionality.
@yhmtsai To clarify: I want the code I am benchmarking on to be as close as possible to |
Found the MSVC issue, we were reading uninitialized data in the prefix_sum kernel, I fixed it and added tests |
Note: This PR changes the Ginkgo ABI:
For details check the full ABI diff under Artifacts here |
Kudos, SonarCloud Quality Gate passed! |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #1312 +/- ##
===========================================
- Coverage 91.27% 91.26% -0.01%
===========================================
Files 577 577
Lines 48721 48739 +18
===========================================
+ Hits 44468 44484 +16
- Misses 4253 4255 +2
... and 1 file with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
The transposition is pretty slow on OpenMP, so to have a fair performance baseline, I want to skip all the operations needed to symmetrize the matrix.