Implementation of Parallelization to analysis.gnm #4700

talagayev · 2024-09-02T09:37:37Z

Fixes #4672

Changes made in this Pull Request:

Parallelization of the class GNMAnalysis in analysis.gnm.py
Addition of parallelization tests (including fixtures in analysis/conftest.py)
update of CHANGELOG

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

Developers certificate of origin

I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.

📚 Documentation preview 📚: https://mdanalysis--4700.org.readthedocs.build/en/4700/

Added GNM

Added GNM client into tests

Added Change to Changelog

added name to changelog

pep8speaks · 2024-09-02T09:37:44Z

Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file package/MDAnalysis/analysis/gnm.py:

Line 250:80: W291 trailing whitespace
Line 251:78: W291 trailing whitespace
Line 260:1: W293 blank line contains whitespace

Comment last updated at 2024-09-09 17:42:45 UTC

github-actions · 2024-09-02T09:39:52Z

Linter Bot Results:

Hi @talagayev! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location	Outcome
main package	⚠️ Possible failure
testsuite	⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/10778417270/job/29889687548

Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

codecov · 2024-09-02T09:51:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.84%. Comparing base (7618e05) to head (b84fc1a).
Report is 1 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4700      +/-   ##
===========================================
- Coverage    93.87%   93.84%   -0.04%     
===========================================
  Files          173      185      +12     
  Lines        21422    22494    +1072     
  Branches      3979     3980       +1     
===========================================
+ Hits         20110    21109     +999     
- Misses         858      931      +73     
  Partials       454      454

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

talagayev · 2024-09-02T11:14:30Z

overall the tests all work for the exception of test_gnm_SVD_fail due to the parallelization leading to an error for this line:
with pytest.warns(UserWarning, match=msg):

with the error being:
E Failed: DID NOT WARN. No warnings of type (<class 'UserWarning'>,) were emitted.
E Emitted warnings: []

hmacdope · 2024-09-08T09:37:17Z

@orbeckst would you be able to review here?

orbeckst

Azure has timeout errors in multiprocessing

=================================== FAILURES ===================================
_ Test_Backends.test_all_backends_give_correct_results[square-iterable1-answer1] _
...
self = <multiprocessing.popen_fork.Popen object at 0x7efe2d29ba30>, flag = 0

    def poll(self, flag=os.WNOHANG):
        if self.returncode is None:
            try:
>               pid, sts = os.waitpid(self.pid, flag)
E               Failed: Timeout >200.0s

I am going to assume that this is related to #4209 so I don't think that this PR can do anything about it.

All other tests pass for me. I didn't see a failure for test_gnm_SVD_fail.

Changes

For actual changes:

add a versionchanged to the GNMAnalysis class to state that parallelization is now enabled
change the entry in CHANGELOG

Comment on test duration increase

Note: The GNM tests are some of the slowest ones that are running:

============================= slowest 50 durations =============================
9.90s call     MDAnalysisTests/analysis/test_gnm.py::test_gnm[client_GNMAnalysis0]
8.92s call     MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-None-50-2-25]
8.81s call     MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-20-50-2-15]
8.79s call     MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-20-50-None-30]
8.67s call     MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-None-None-2-49]
8.57s call     MDAnalysisTests/analysis/test_gnm.py::test_gnm[client_GNMAnalysis1]
8.24s call     MDAnalysisTests/coordinates/test_pdb.py::test_conect_bonds_all
8.08s call     MDAnalysisTests/analysis/test_contacts.py::TestContacts::test_n_initial_contacts[datafiles1-41814]
8.06s call     MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis[client_GNMAnalysis1]
7.67s call     MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis_weights_None[client_GNMAnalysis1]
7.36s call     MDAnalysisTests/analysis/test_encore.py::TestEncore::test_ces_error_estimation_ensemble_bootstrap
7.11s call     MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis[client_GNMAnalysis0]
6.99s call     MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis_weights_None[client_GNMAnalysis0]
6.93s call     MDAnalysisTests/utils/test_pickleio.py::test_fileio_pickle
6.14s call     MDAnalysisTests/coordinates/test_gro.py::TestGROLargeWriter::test_writer_large
5.70s call     MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis[client_GNMAnalysis2]
5.64s call     MDAnalysisTests/analysis/test_gnm.py::test_gnm_run_step[client_GNMAnalysis1]

Previously (from PR #4682), GNM was slow but not the slowest.

============================= slowest 50 durations =============================
9.49s call     MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-20-50-None-30]
9.02s call     MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-20-50-2-15]
8.90s call     MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-None-None-2-49]
8.63s call     MDAnalysisTests/coordinates/test_pdb.py::test_conect_bonds_all
8.58s call     MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-None-50-2-25]
8.11s call     MDAnalysisTests/analysis/test_contacts.py::TestContacts::test_n_initial_contacts[datafiles1-41814]
7.41s call     MDAnalysisTests/analysis/test_encore.py::TestEncore::test_ces_error_estimation_ensemble_bootstrap
6.97s call     MDAnalysisTests/utils/test_pickleio.py::test_fileio_pickle
6.26s call     MDAnalysisTests/coordinates/test_gro.py::TestGROLargeWriter::test_writer_large
5.79s call     MDAnalysisTests/analysis/test_rms.py::TestRMSD::test_weights_mass_is_mass_weighted[client_RMSD1]
5.73s call     MDAnalysisTests/analysis/test_rms.py::TestRMSD::test_rmsd_atomgroup_selections[client_RMSD1]
5.60s call     MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis_weights_None
5.44s call     MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis

It looks as if the tests in parallel take longer than the serial tests — possibly setup and then competition with the parallel runners.

package/CHANGELOG

talagayev · 2024-09-09T16:57:16Z

All other tests pass for me. I didn't see a failure for test_gnm_SVD_fail.

Yes for this PR I didn't add the client_GNMAnalysis to the test_gnm_SVD_fail pytest, since I was not sure if it is required to have it and without it all the remaining tests did work without errors, thus I was not sure if it should be implemented if it will cause an error 😿

talagayev · 2024-09-09T17:00:30Z

As for the other classes, since I tried also adding the parallelization implementation to them, is there a certain way to say when they are unparallelizable? since I am quite hesitant to mark them as unparallelizable, since maybe I could overlook an option to implement the parallelization with some adjustments.

moved the parallelization below Issue MDAnalysis#4158

Addiditon of versionchanged with the mention of the support for multiprocessing and dask backends

orbeckst

I am ok with not testing parallelization in test_gnm_SVD_fail.

Can you please update the versionchanged (see my suggestion)?

Otherwise looking ready to me.

package/MDAnalysis/analysis/gnm.py

Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>

talagayev · 2024-09-09T17:43:33Z

I am ok with not testing parallelization in test_gnm_SVD_fail.

Can you please update the versionchanged (see my suggestion)?

Otherwise looking ready to me.

Nice, then we can leave it like this for the tests :)

Yes adjusted it, yes was just copy-pasting from the RMSD class versionchanged :)

orbeckst · 2024-09-09T17:47:16Z

As for the other classes, since I tried also adding the parallelization implementation to them, is there a certain way to say when they are unparallelizable? since I am quite hesitant to mark them as unparallelizable, since maybe I could overlook an option to implement the parallelization with some adjustments.

If you're unsure, start a discussion on the issue. There are plenty of people around who can look at an algorithm and given an opinion. Perhaps sometimes we come up with an idea how to do it. For example, RMSF is in principle parallelizable with split-apply-combine, one just has to do some extra work (see https://www.mdanalysis.org/pmda/api/rmsf.html ) but at the moment only serial is enable

mdanalysis/package/MDAnalysis/analysis/rms.py

Lines 756 to 757 in 7618e05

    
           def get_supported_backends(cls): 
        
               return ('serial',)

because someone has to do the extra work.

talagayev · 2024-09-09T18:12:47Z

As for the other classes, since I tried also adding the parallelization implementation to them, is there a certain way to say when they are unparallelizable? since I am quite hesitant to mark them as unparallelizable, since maybe I could overlook an option to implement the parallelization with some adjustments.

If you're unsure, start a discussion on the issue. There are plenty of people around who can look at an algorithm and given an opinion. Perhaps sometimes we come up with an idea how to do it. For example, RMSF is in principle parallelizable with split-apply-combine, one just has to do some extra work (see https://www.mdanalysis.org/pmda/api/rmsf.html ) but at the moment only serial is enable

mdanalysis/package/MDAnalysis/analysis/rms.py

Lines 756 to 757 in 7618e05

def get_supported_backends(cls):

return ('serial',)

because someone has to do the extra work.

Ah I see, that sounds good, since I was sure for the implementation for this class and in analysis.bat and from the testing of parallelization implementation analysis.align it looks that analysis.align is unparallelizable due to getting the following error:
ValueError: Can not parallelize class <class 'MDAnalysis.analysis.align.AlignTraj'>

so that it has the following case which is written in the base.py:

if not self.parallelizable and backend_class is not BackendSerial:
raise ValueError(f"Can not parallelize class {self.__class__}")

Thus I would tend to mark it as unparallelizable and create a PR for it :)

orbeckst

Looks ready to me.

The Azure failure is a multiprocessing timeout failure and likely not due to this PR.

A concern is the increased run time of the tests but that's a general thing with the parallel tests, I think.

orbeckst · 2024-09-09T19:13:51Z

Thanks @talagayev !

talagayev · 2024-09-09T19:39:39Z

Thanks @talagayev !

Happy to help :)

talagayev added 5 commits August 31, 2024 23:11

Update gnm.py

64da707

Update conftest.py

d23202c

Added GNM

Update test_gnm.py

ea22d60

Added GNM client into tests

Update CHANGELOG

a66b6ca

Added Change to Changelog

Update CHANGELOG

ddb44bc

added name to changelog

hmacdope requested a review from orbeckst September 8, 2024 09:36

orbeckst self-assigned this Sep 8, 2024

Merge branch 'develop' into gnm_parallel

f0f95bd

orbeckst requested changes Sep 9, 2024

View reviewed changes

package/CHANGELOG Outdated Show resolved Hide resolved

talagayev added 2 commits September 9, 2024 19:02

Update CHANGELOG

7f0ba62

moved the parallelization below Issue MDAnalysis#4158

Update gnm.py

2673e21

Addiditon of versionchanged with the mention of the support for multiprocessing and dask backends

orbeckst requested changes Sep 9, 2024

View reviewed changes

package/MDAnalysis/analysis/gnm.py Outdated Show resolved Hide resolved

Update package/MDAnalysis/analysis/gnm.py

b84fc1a

Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>

orbeckst approved these changes Sep 9, 2024

View reviewed changes

orbeckst merged commit b3208b3 into MDAnalysis:develop Sep 9, 2024
20 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Parallelization to analysis.gnm #4700

Implementation of Parallelization to analysis.gnm #4700

talagayev commented Sep 2, 2024 •

edited by github-actions bot

Loading

pep8speaks commented Sep 2, 2024 •

edited

Loading

github-actions bot commented Sep 2, 2024 •

edited

Loading

codecov bot commented Sep 2, 2024 •

edited

Loading

talagayev commented Sep 2, 2024

hmacdope commented Sep 8, 2024

orbeckst left a comment

talagayev commented Sep 9, 2024

talagayev commented Sep 9, 2024

orbeckst left a comment

talagayev commented Sep 9, 2024

orbeckst commented Sep 9, 2024

talagayev commented Sep 9, 2024

orbeckst left a comment

orbeckst commented Sep 9, 2024

talagayev commented Sep 9, 2024

Implementation of Parallelization to analysis.gnm #4700

Implementation of Parallelization to analysis.gnm #4700

Conversation

talagayev commented Sep 2, 2024 • edited by github-actions bot Loading

PR Checklist

Developers certificate of origin

pep8speaks commented Sep 2, 2024 • edited Loading

Comment last updated at 2024-09-09 17:42:45 UTC

github-actions bot commented Sep 2, 2024 • edited Loading

Linter Bot Results:

codecov bot commented Sep 2, 2024 • edited Loading

Codecov Report

talagayev commented Sep 2, 2024

hmacdope commented Sep 8, 2024

orbeckst left a comment

Choose a reason for hiding this comment

Changes

Comment on test duration increase

talagayev commented Sep 9, 2024

talagayev commented Sep 9, 2024

orbeckst left a comment

Choose a reason for hiding this comment

talagayev commented Sep 9, 2024

orbeckst commented Sep 9, 2024

talagayev commented Sep 9, 2024

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst commented Sep 9, 2024

talagayev commented Sep 9, 2024

talagayev commented Sep 2, 2024 •

edited by github-actions bot

Loading

pep8speaks commented Sep 2, 2024 •

edited

Loading

github-actions bot commented Sep 2, 2024 •

edited

Loading

codecov bot commented Sep 2, 2024 •

edited

Loading