Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use nano-gemm instead of matrixmultiply #292

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

cschwan
Copy link
Contributor

@cschwan cschwan commented Jun 6, 2024

@cschwan cschwan self-assigned this Jun 6, 2024
@cschwan cschwan linked an issue Jun 6, 2024 that may be closed by this pull request
@cschwan
Copy link
Contributor Author

cschwan commented Jun 30, 2024

Using the 22 GB-sized flavour-basis EKO ATLAS_1JET_8TEV_R06 for a strong coupling of 0.119 at the Z-boson mass, the old code takes

real    160m21.133s
user    159m53.132s
sys     0m27.700s

to evolve the corresponding grid. With nano-gemm it only takes

real    121m36.539s
user    121m9.532s
sys     0m26.784s

That's a wonderful 25% reduction in runtime!

BTW: the old runtime (with OpenBLAS even) was

real    301m45.642s
user    300m24.360s
sys     0m54.963s

and the difference comes from the fact that I used an evolution-basis EKO that's much bigger due to the rotation: 45 GB.

@mert-kurttutan
Copy link

mert-kurttutan commented Jul 3, 2024

Hi,
Your problem seems interesting and practical. Could please you share the steps to reproduce your results (along with the specs of the hardware you used) if possible?
I might be able to contribute.

Edit0: Your benchmark seems to be too long for experiment. I would appreciate, if you provide steps for smaller version of your benchmark test
Thanks

@cschwan
Copy link
Contributor Author

cschwan commented Jul 4, 2024

Hi @mert-kurttutan,

The linear algebra routines are used in an operation that we call 'evolution', and some faster running evolutions are used in our integration tests. See this file: pineappl_cli/tests/evolve.rs. These tests run the binary that we call the 'PineAPPL CLI', and for these cases we always run

pineappl evolve <INPUT> <EKO> <OUTPUT> <CONV_FUNS>

The integration tests simply verify the output.

For examples of how the arguments are used have a look at the tests. To run them successfully, you'll need the test data that is passed to the CLI, which you can download here. It's probably easier to copy and run the wget calls from maintainer/generate-coverage.sh. The files must be placed into a folder test-data at top level of the repository.

The installation is probably a bit tricky, please read https://nnpdf.github.io/pineappl/docs/installation.html#cli-pineappl-for-your-shell for instructions (you will need the evolution feature, all other optional features are not needed). However, before compiling the Rust code, you'll need to install LHAPDF 6.5.4; without this C++ library nothing will compile/run unfortunately. The installation instructions for this library are here.

If you have suggestion on how to improve these documents we'd be happy to take your comments into account (best in a separate Issue). While I'm writing this, I realized that in our installation documents we never mention LHAPDF, probably because practically everyone in our community has it installed. This has to be improved.

@cschwan
Copy link
Contributor Author

cschwan commented Jul 22, 2024

Starting from commit 9ca3022 the new baseline is now:

real    4m54.425s
user    4m20.387s
sys     0m32.138s

Apparently we did a lot of linear algebra with zeros 😞.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate nano-gemm crate to improve speed of linear algebra
2 participants