[patch] restore forbidding duplicate imports #1453

samwaseda · 2024-05-29T08:30:46Z

This PR partially solves this problem, as it forbids to import a job whose name already exists in the project, but I make it a draft because it does not clarify why the problem is occurring in the first place.

samwaseda · 2024-05-29T08:55:34Z

So I guess for now this is the most pragmatic solution. I still do not perfectly understand what's happening, but it sounds reasonable to me to forbid importing jobs with an existing job name. I put the error in the importing function and not deeper in base, because I thought maybe it is not fundamentally forbidden that there are jobs with the same name in pyiron? Maybe @jan-janssen can make a comment? One way or other, I think this is more like a temporary fix, and we need an overhaul in this PR. What do you think?

liamhuber · 2024-05-29T16:29:47Z

because I thought maybe it is not fundamentally forbidden that there are jobs with the same name in pyiron?

I actually wasn't sure what the behaviour would be here, so I checked. The executive summary is:

Project doesn't care at all about names until something has been run
Project won't run a job whose name is already a run job in its table
Creating a new job with an existing job name loads the existing job
The job column of the project table is not unique, but the combination of project and job is

Demo:

from pyiron_atomistics import Project
pr = Project("test")
pr.remove_jobs(silently=True)
j1 = pr.create.job.Lammps("lmp")
j1.structure = pr.create.structure.bulk("Al")

j2 = pr.create.job.Lammps("lmp")
j2.structure = pr.create.structure.bulk("Cu")

print(j1.structure.get_chemical_formula(), j2.structure.get_chemical_formula())
# Al Cu
print(j1.name, j2.name)
# lmp lmp
print(pr.job_table().job)
# Series([], Name: job, dtype: object)

j1.run()
# Runs as expected

j2.run()
# Prints a warning and doesn't run `job exists already and therefore was not created!`

j3 = pr.create.job.Lammps("lmp")
# Silently (really, I set `warnings.simplefilter("always")`) reloads the job

print(j1.structure.get_chemical_formula(), j2.structure.get_chemical_formula(), j3.structure.get_chemical_formula())
# Al Cu Al
print(pr.job_table().job)
# 0    lmp; Name: job, dtype: object

Only the project+job column gives a unique name, as it's fine to create and run a job in a subproject:

sub_pr = pr.create_group("sub_pr")
k1 = sub_pr.create.job.Lammps("lmp")
k1.structure = sub_pr.create.structure.bulk("Ni")
k1.run()
print(pr.job_table().job)
# 0    lmp; 1    lmp; Name: job, dtype: object

IMO there's nothing faulty with any of this, although it is annoying to find out "late" that your job name is non-unique and won't run -- would be nicer to learn that at creation time. This is impossible in the current setup though, as project is not truly aware of a job until after it has run.

I've attacked this general topic in pyiron_workflow using a Semantic class, where the most relevant bit is probably SemanticParent.add_child. This fundamentally differs as it's all happening at the object level in-memory, rather than referring to the database as (I think) Project is doing for these checks. I'm not suggesting we need to pull anything from there in here, it's just an FYI for attacking the topic from another angle.

liamhuber

I am completely behind the idea of having a check here, but I don't think this does what you want and it needs tests to prove it's doing the desired thing.

pyiron_base/project/archiving/import_archive.py

tests/archiving/test_import.py

samwaseda · 2024-05-29T17:15:51Z

because I thought maybe it is not fundamentally forbidden that there are jobs with the same name in pyiron?

I actually wasn't sure what the behaviour would be here, so I checked. The executive summary is:

Project doesn't care at all about names until something has been run

Project won't run a job whose name is already a run job in its table

Creating a new job with an existing job name loads the existing job

The job column of the project table is not unique, but the combination of project and job is

Yep so far that was more or less what I had expected. What I don't know is whether it can happen that there are jobs with the same name that have run. I super vaguely remember that this can happen when there are two child jobs which share the same name. In that case it's the same project and the same job name but with different parents, but as I said I don't clearly remember it.

liamhuber · 2024-05-29T17:27:00Z

I super vaguely remember that this can happen when there are two child jobs which share the same name. In that case it's the same project and the same job name but with different parents, but as I said I don't clearly remember it.

Oh yeah, I wasn't thinking about parent jobs. I don't intend to dig in and make an example for that though 😝

pmrv

Agree with @liamhuber's diagnosis. Probably you want to join job and project columns in the given df first, before making this check.

samwaseda · 2024-06-19T20:15:53Z

I guess now I found the workaround and hopefully the tests will pass this time...

samwaseda · 2024-06-19T20:22:50Z

In the end, it boiled down to pr.remove(enable=True), so I removed all the other lines that I had added. Since there'll be an overhaul in this PR, I don't think the error that I had added will be particularly helpful.

jan-janssen

Looks good to me. @samwaseda Can you merge this?

samwaseda · 2024-06-20T12:02:50Z

It's blocked because @liamhuber's approval is missing

The request was addressed.

restore forbidding duplicate imports

f0b12d5

samwaseda mentioned this pull request May 29, 2024

[minor] refactor super minor things #1449

Merged

Base automatically changed from fix_archiving_problem to main May 29, 2024 08:33

samwaseda requested review from jan-janssen, liamhuber and pmrv May 29, 2024 08:55

liamhuber previously requested changes May 29, 2024

View reviewed changes

pyiron_base/project/archiving/import_archive.py Outdated Show resolved Hide resolved

tests/archiving/test_import.py Outdated Show resolved Hide resolved

pmrv requested changes Jun 4, 2024

View reviewed changes

jan-janssen marked this pull request as draft June 12, 2024 05:41

add remove

6bc57ad

samwaseda marked this pull request as ready for review June 19, 2024 20:14

remove the value error

d8afc30

samwaseda requested review from liamhuber, pmrv and niklassiemer June 20, 2024 06:29

samwaseda changed the title ~~restore forbidding duplicate imports~~ [patch] restore forbidding duplicate imports Jun 20, 2024

pmrv approved these changes Jun 20, 2024

View reviewed changes

jan-janssen approved these changes Jun 20, 2024

View reviewed changes

jan-janssen merged commit 5003ad0 into main Jun 20, 2024
24 checks passed

jan-janssen deleted the forbid_importing_duplicate_jobs branch June 20, 2024 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[patch] restore forbidding duplicate imports #1453

[patch] restore forbidding duplicate imports #1453

samwaseda commented May 29, 2024

samwaseda commented May 29, 2024

liamhuber commented May 29, 2024

liamhuber left a comment

samwaseda commented May 29, 2024

liamhuber commented May 29, 2024

pmrv left a comment

samwaseda commented Jun 19, 2024

samwaseda commented Jun 19, 2024

jan-janssen left a comment

samwaseda commented Jun 20, 2024

[patch] restore forbidding duplicate imports #1453

[patch] restore forbidding duplicate imports #1453

Conversation

samwaseda commented May 29, 2024

samwaseda commented May 29, 2024

liamhuber commented May 29, 2024

liamhuber left a comment

Choose a reason for hiding this comment

samwaseda commented May 29, 2024

liamhuber commented May 29, 2024

pmrv left a comment

Choose a reason for hiding this comment

samwaseda commented Jun 19, 2024

samwaseda commented Jun 19, 2024

jan-janssen left a comment

Choose a reason for hiding this comment

samwaseda commented Jun 20, 2024