Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pyiron based example #73

Merged
merged 9 commits into from
Mar 14, 2024
Merged

Add pyiron based example #73

merged 9 commits into from
Mar 14, 2024

Conversation

jan-janssen
Copy link
Contributor

@jan-janssen jan-janssen commented Mar 12, 2024

This pull requests implements the example workflow using the pyiron_base workflow manager which is developed as part of the pyiron project. The workflow in the workflow.py file can be executed using:

python workflow.py

In addition, the pull request also adds the corresponding Github action and a short Readme to explain the usage of the example.

@jan-janssen jan-janssen reopened this Mar 12, 2024
Copy link
Member

@joergfunger joergfunger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the merge request. The tests are running through (there is a problem with cwl, but that is unrelated to your merge request). There is a main question related to the connection of the workflow steps in addition to some minor details.

  1. What is not clear to me is, how the dependencies are defined. The inputs are specified for each workflow step, but not the output and thus I'm not sure how the execution graph of the workflow is build. This is also relevant in the example, where instead of the output of a previous step (num_dofs) a hardcoded number is used.
  2. The current implementation seems to always execute the complete python script with all steps - most workflow tools build a graph and then execute this graph. This would also be important when building a provenance graph of an execution/run to follow all processing steps and the information that was transfered.
  3. This then also relates to the last question of avoiding the computation of already computed steps when e.g. working only on the postprocessing. It seems here that the complete workflow is re-executed. Not all tools provide an option to do so, but if that is the case, it would be nice to showcase that here (in a realistic scenario we would often define the function in a workflow step (e.g. the postprocessing script) as an additional input and thus trigger automatically a recomputation of only that step (and subsequent ones), since that input file has changed.
  4. That finally relates to the last question, which is not directly related to the merge request, but a general pyiron question. We often assume that a workflow step is completely defined by the inputs and the function call, i.e. there are no hidden static variables that are set in a workflow step and then reused in subsequent steps. With file based interfaces, this is relatively easy to ensure, since there are no internal variables attached to a workflow step. How would that be ensured in a system where complete objects/jobs are stored in memory? Related then the question, is there - in your opinion - a difference between a workflow system and a complete programming language?

exemplary_workflow/pyiron/workflow.py Outdated Show resolved Hide resolved
.github/workflows/main.yml Show resolved Hide resolved
exemplary_workflow/pyiron/workflow.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@pdiercks pdiercks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jan-janssen ,

thank you very much for the PR!

I added my review (mostly questions to clarify). I agree with @joergfunger that not explicitly defining the outputs/targets of each job is not ideal (and probably something that should be changed in a future version).

I was not able to run the pyiron workflow on my machine.
The error seems to be related to mpi4py and conda_subprocess.

pyiron_base ❯ python workflow.py 
The job gmsh was saved and received the ID: 1
The job meshio was saved and received the ID: 2
The job poisson was saved and received the ID: 3
2024-03-12 11:54:53,752 - pyiron_log - WARNING - Job aborted
2024-03-12 11:54:53,752 - pyiron_log - WARNING - ERROR: could not import mpi4py!
Traceback (most recent call last):
  File "/home/pdiercks/repos/nfdi4ing/pyiron_example/exemplary_workflow/pyiron/workflow/poisson_hdf5/poisson/poisson.py", line 6, in <module>
    import dolfin as df
  File "/home/pdiercks/mambaforge/envs/processing/lib/python3.9/site-packages/dolfin/__init__.py", line 142, in <module>
    from .fem.assembling import (assemble, assemble_system, assemble_multimesh,
  File "/home/pdiercks/mambaforge/envs/processing/lib/python3.9/site-packages/dolfin/fem/assembling.py", line 34, in <module>
    from dolfin.fem.form import Form
  File "/home/pdiercks/mambaforge/envs/processing/lib/python3.9/site-packages/dolfin/fem/form.py", line 12, in <module>
    from dolfin.jit.jit import dolfin_pc, ffc_jit
  File "/home/pdiercks/mambaforge/envs/processing/lib/python3.9/site-packages/dolfin/jit/jit.py", line 121, in <module>
    def compile_class(cpp_data, mpi_comm=MPI.comm_world):
RuntimeError: Error when importing mpi4py

Traceback (most recent call last):
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 665, in execute_job_with_external_executable
    out = conda_subprocess.run(
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/conda_subprocess/interface.py", line 175, in run
    raise CalledProcessError(
subprocess.CalledProcessError: Command '['/bin/bash', '/tmp/tmprzp6zlv0']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pdiercks/repos/nfdi4ing/pyiron_example/exemplary_workflow/pyiron/workflow.py", line 51, in <module>
    poisson = pr.wrap_executable(
              ^^^^^^^^^^^^^^^^^^^
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/project/generic.py", line 436, in wrap_executable
    job.run()
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/utils/deprecate.py", line 171, in decorated
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/generic.py", line 763, in run
    self._run_if_new(debug=debug)
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/generic.py", line 1281, in _run_if_new
    run_job_with_status_initialized(job=self, debug=debug)
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 92, in run_job_with_status_initialized
    job.run()
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/utils/deprecate.py", line 171, in decorated
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/generic.py", line 765, in run
    self._run_if_created()
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/generic.py", line 1292, in _run_if_created
    return run_job_with_status_created(job=self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 115, in run_job_with_status_created
    job.run_static()
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/generic.py", line 799, in run_static
    execute_job_with_external_executable(job=self)
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 610, in wrapper
    func(job)
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 676, in execute_job_with_external_executable
    out, job_crashed = handle_failed_job(job=job, error=e)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 732, in handle_failed_job
    raise RuntimeError("Job aborted")
RuntimeError: Job aborted

The conda environments were created successfully. Here, I would also suggest to use a local prefix for the installation instead of a name and global installation.

Best,
Philipp

exemplary_workflow/pyiron/workflow.py Show resolved Hide resolved
exemplary_workflow/pyiron/workflow.py Show resolved Hide resolved
exemplary_workflow/pyiron/workflow.py Outdated Show resolved Hide resolved
exemplary_workflow/pyiron/README.md Outdated Show resolved Hide resolved
exemplary_workflow/pyiron/workflow.py Show resolved Hide resolved
exemplary_workflow/pyiron/README.md Outdated Show resolved Hide resolved
jan-janssen and others added 2 commits March 12, 2024 07:44
Co-authored-by: Philipp Diercks <55233917+pdiercks@users.noreply.github.com>
@jan-janssen
Copy link
Contributor Author

I was not able to run the pyiron workflow on my machine. The error seems to be related to mpi4py and conda_subprocess.

I was not able to reproduce your error. Is it possible that you already had an environment named processing?

@jan-janssen
Copy link
Contributor Author

@pdiercks and @joergfunger thank you both for the careful review. I updated the pull request and try to address the majority of the points individually. For the remaining points, I try to provide a brief summary here:

  • The workflow graph is currently not explicitly created. We work in this functionality as part of the pyiron_workflow module.
  • The conda environments in pyiron are stored in the central conda installation. This is an error-prone solution, so I plan to add the functionality to store conda environments in the project directory More flexible conda environments pyiron/pyiron_base#1371
  • The output of the shell tasks is not stored in the inside the pyiron object unless the output is redirected to a file.

I think the first two points can be addressed rather quickly (days to weeks), while the first point most likely still takes more time (weeks to months). So I guess my question is: What is the minimal amount of changes to merge this pull request?

@joergfunger
Copy link
Member

The changes are well addressed and in principle we could merge. Just a couple of remarks.

  1. The output is not yet specified. In the current setup, it is quite difficult for a user to know what files are produced by a job and thus how to use it correctly. In the future, this might also relevant to know what files have to be collected when the jobs is executed on an external machine
  2. The questions of updates is not yet fully addressed. In case that pyiron offers the option to only recompute things (similar to a makefile) that have changed, it would be great to include that option.
  3. I had a challenge with an old version of mamba (I had still 1.1 installed that did not provide the -y option) that was used to build the conda-environments in the workflow. Maybe adding that as a dependency would help.

@pdiercks
Copy link
Collaborator

I was not able to run the pyiron workflow on my machine. The error seems to be related to mpi4py and conda_subprocess.

I was not able to reproduce your error. Is it possible that you already had an environment named processing?

I will investigate, but it's probably a problem with my local setup rather than the workflow. Also, the actions / tests were passing before, so no need for changes on your end.

@pdiercks
Copy link
Collaborator

@pdiercks and @joergfunger thank you both for the careful review. I updated the pull request and try to address the majority of the points individually. For the remaining points, I try to provide a brief summary here:

* The workflow graph is currently not explicitly created. We work in this functionality as part of the [pyiron_workflow](https://github.com/pyiron/pyiron_workflow) module.

* The conda environments in pyiron are stored in the central conda installation. This is an error-prone solution, so I plan to add the functionality to store conda environments in the project directory [More flexible conda environments pyiron/pyiron_base#1371](https://github.com/pyiron/pyiron_base/issues/1371)

* The output of the shell tasks is not stored in the inside the pyiron object unless the output is redirected to a file.

I think the first two points can be addressed rather quickly (days to weeks), while the first point most likely still takes more time (weeks to months). So I guess my question is: What is the minimal amount of changes to merge this pull request?

I would also be okay to merge. Thanks @jan-janssen for incorporating the changes and your explanations!

@jan-janssen
Copy link
Contributor Author

The changes are well addressed and in principle we could merge. Just a couple of remarks.

  1. The output is not yet specified. In the current setup, it is quite difficult for a user to know what files are produced by a job and thus how to use it correctly. In the future, this might also relevant to know what files have to be collected when the jobs is executed on an external machine

As pyiron is copying all files to the working directory of each job step, we basically define the files on the input level rather than the output level. So for a remote setup this would mean we copy the working directory including all the required files for a given task via SSH to a remote computer and potentially even submit it to the scheduler there. Then once we get run time the task is executed and afterwards the results are compressed in one tar archive. This tar archive is then copied back to the local computer, there the output can be reused in the following job steps.

  1. The questions of updates is not yet fully addressed. In case that pyiron offers the option to only recompute things (similar to a makefile) that have changed, it would be great to include that option.

Yes, in pyiron each job receives a unique ID and so if a job was successfully calculated then it is just reloaded and no additional execution is necessary. In the case a calculation failed then the user has to manually remove this failed calculation before the workflow can be executed again. We added this blocker to remind users to look at their failed calculation and check if it was just a random error or if a modification of the input is necessary.

  1. I had a challenge with an old version of mamba (I had still 1.1 installed that did not provide the -y option) that was used to build the conda-environments in the workflow. Maybe adding that as a dependency would help.

Based on this feedback, we made mamba an optional dependency and rely on conda as default option. The conda version is tracked via the conda_subprocess package. pyiron/pyiron_base#1370

Copy link
Member

@joergfunger joergfunger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice work. If you update pyiron, it would be great to integrate the updated versions here as well.

@joergfunger joergfunger merged commit 652d7c3 into BAMresearch:main Mar 14, 2024
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants