Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for reportengine #1804

Merged
merged 6 commits into from
Sep 8, 2023
Merged

Documentation for reportengine #1804

merged 6 commits into from
Sep 8, 2023

Conversation

comane
Copy link
Member

@comane comane commented Aug 31, 2023

Documentation / Tutorial for the execution of validphys scripts with --parallel flag.

@scarlehoff
Copy link
Member

Many thanks for this. Reading through the docs you added I have some questions:

Note that in the above example we also fixed the number of threads at disposal by each worker to be 1, this is important in order to avoid possible racing conditions triggered by the matplotlib library.

What happens if I run --parallel without giving a scheduler? Can we have a set of sensible defaults in validphys so that if one runs just with --parallel the resulting action won't take all available memory / incur in race conditions ?

@comane
Copy link
Member Author

comane commented Sep 6, 2023

What happens if I run --parallel without giving a scheduler? Can we have a set of sensible defaults in validphys so that if one runs just with --parallel the resulting action won't take all available memory / incur in race conditions ?

At present, if one runs just with --parallel no race conditions should take place as the number of threads will be fixed to one per worker: client = Client(threads_per_worker=1).

Regarding the second point, namely the usage of memory when running without a scheduler, I do not have an immediate solution, but I could address this problem in the future.

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

doc/sphinx/source/tutorials/reportengine_parallel.md Outdated Show resolved Hide resolved
doc/sphinx/source/tutorials/reportengine_parallel.md Outdated Show resolved Hide resolved
@scarlehoff
Copy link
Member

I think this could be merged (the tests should pass in master).

One thing though is that maybe we do want to remove the possibility of running --parallel without an scheduler. Or fix the default numbers to something small to avoid memory blow ups.

comane and others added 2 commits September 8, 2023 08:50
Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>
Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>
@comane
Copy link
Member Author

comane commented Sep 8, 2023

One thing though is that maybe we do want to remove the possibility of running --parallel without an scheduler. Or fix the default numbers to something small to avoid memory blow ups.

Yes I agree with you that this would need to be fixed in reportengine at some point.

Regarding the documentation would you prefer not having the last section on 'Using dask without a Scheduler'?
Note that in the documentation it is specified that --parallel without a scheduler should not be used for tasks computationally more expensive than plot_pdfs

@scarlehoff
Copy link
Member

Nono, as it is right now better to have it in the docs (with the caveat, as you did).
I was more thinking aloud, because people will use it for heavy stuff and vp will crash on them.

this would need to be fixed in reportengine at some point.

Can't defaults be set from vp using the dask.config thing? (I don't know much about dask so don't really know).

@comane
Copy link
Member Author

comane commented Sep 8, 2023

Ok, I see. Yes I guess that could be possible. Thanks for the suggestion.

I will definitely have a look at that in reportengine (possibly in the coming days).

@scarlehoff scarlehoff merged commit a6e1ddf into master Sep 8, 2023
0 of 4 checks passed
@scarlehoff scarlehoff deleted the reportengine_dask_doc branch September 8, 2023 09:03
@Zaharid
Copy link
Contributor

Zaharid commented Sep 11, 2023

I think we definitively want to make the case where the scheduler starts inside of the process more usable. Setting some sensible parameters will work for that well enough.

More long term, we like to get rid of the lru_cache for large objects such as fktables and instead make the vp providers (and therefore dask nodes).

@giacomomagni
Copy link
Contributor

Hi @comane,
I was trying to follow your tutorial to speed up reports, however it this feature implemented somewhere?

usage: vp-comparefits [--title TITLE] [--author AUTHOR] [--keywords KEYWORDS [KEYWORDS ...]] [--thcovmat_if_present] [--no-thcovmat_if_present] [--current_fit_label [CURRENT_FIT_LABEL]]
                      [--reference_fit_label [REFERENCE_FIT_LABEL]] [-i] [-c] [-l] [-o OUTPUT] [-q | -d] [--style STYLE] [--formats FORMATS [FORMATS ...]] [-x EXTRA_PROVIDERS [EXTRA_PROVIDERS ...]]
                      [--dry] [--parallel | --no-parallel] [--folder-prefix] [-h [HELP]] [--cout | --no-cout] [--net | --no-net] [--upload]
                      [current_fit] [reference_fit]
vp-comparefits: error: unrecognized arguments: --scheduler tcp://145.107.4.126:8786

@scarlehoff
Copy link
Member

Are you using the latest version of vp (and reportengine) ?

@giacomomagni
Copy link
Contributor

I have validiphys from master (in develop mode):

validphys.__package__  validphys.__path__     
>>> validphys.__path__
['/data/theorie/gmagni/NNPDF/nnpdf/validphys2/src/validphys']
>>> validphys.__version__
'4.0.6.1212+g53330a638-dev'
>>> reportengine.__version__
'0.31'

@comane
Copy link
Member Author

comane commented Oct 13, 2023

Hi @giacomomagni,
have you tried getting reportengine in develop mode too?
(I haven't tried the parallel --scheduler version by installing directly validphys)

The dev version I have is:

>>> reportengine.__version__
'0.30dev'

@giacomomagni
Copy link
Contributor

giacomomagni commented Oct 13, 2023

I see, thanks for the reply @comane.

Would it be possible to have a new reportengine release implementing this and update the vp dependency ?

If this feature it's still under development in reportengine no problem (maybe for the next time just wait to merge this PR..)

@giacomomagni
Copy link
Contributor

Do I need a specific branch of reportengine?
The version you suggested doesn't seem enough to get it working...

(nnpdf) [gmagni@stbc-i1 reports]$ python
>>> import reportengine
>>> reportengine.__version__
'0.30dev'
>>> quit()
(nnpdf) [gmagni@stbc-i1 reports]$ vp-comparefits --scheduler
usage: vp-comparefits [--title TITLE] [--author AUTHOR] [--keywords KEYWORDS [KEYWORDS ...]] [--thcovmat_if_present] [--no-thcovmat_if_present] [--current_fit_label [CURRENT_FIT_LABEL]] [--reference_fit_label [REFERENCE_FIT_LABEL]] [-i]
                      [-c] [-l] [-o OUTPUT] [-q | -d] [--style STYLE] [--formats FORMATS [FORMATS ...]] [-x EXTRA_PROVIDERS [EXTRA_PROVIDERS ...]] [--parallel | --no-parrallel] [-h [HELP]] [--cout | --no-cout] [--net | --no-net] [--upload]
                      [current_fit] [reference_fit]
vp-comparefits: error: unrecognized arguments: --scheduler

@scarlehoff
Copy link
Member

Strange:

>>> import reportengine
>>> reportengine.__version__
'0.30dev'

git log -r -1
commit 79eec2e33c5aab3998fd58e9c3cf84bd2c2696e7 (HEAD -> master, origin/master, origin/HEAD)

which is current master. Validphys is also in current master.

@comane
Copy link
Member Author

comane commented Oct 16, 2023

Hi @giacomomagni,
no you do not need a specific branch, the --parallel version has been merged into master.

What is your git log -r -1, it should be the same as the one @scarlehoff just printed.

@scarlehoff I assume that for you the --scheduler option is working?

@scarlehoff
Copy link
Member

Yes, for me it does work.

@giacomomagni
Copy link
Contributor

giacomomagni commented Oct 16, 2023

okay, yes I really needed git clone. Apparently the 0.30dev version you get from pip is not the same.
Thamks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants