-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] First pass at porting QChem into atomate2, waiting on concurrent PR approvals #161
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #161 +/- ##
==========================================
- Coverage 76.10% 75.34% -0.77%
==========================================
Files 87 94 +7
Lines 7151 7551 +400
Branches 1057 1146 +89
==========================================
+ Hits 5442 5689 +247
- Misses 1386 1509 +123
- Partials 323 353 +30
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rdguha1995,
This looks great! Thanks very much.
My main comment is on the input sets. I think it is a very bad idea wrapping around the old style ones in pymatgen. I've written more detail in the review comments. It will definitely be worthwhile to refactor the old dict set into atomate2. I'm happy to help if you run into trouble but I think it should be quick.
On a separate note, you should install the pre-commit so that your commits are automatically formatted in line with the atomate2 style. To do that you can run the following in the root atomate2 folder:
pip install pre-commit
pre-commit install
Then, any time you make changes the formatting tools will be applied. If changes are made, you will have to add them to the commit for the changes to be included. To run pre-commit on all files you can use:
pre-commit run --all
Happy to chat on zoom if you have any questions.
src/atomate2/qchem/flows/core.py
Outdated
""" | ||
|
||
name: str = "frequency flattening structure optimization" | ||
max_iterations=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dataclasses you need to specify the type. E.g., this should be
max_iterations=10 | |
max_iterations: int = 10 |
Otherwise the dataclass won't work properly.
Please double check all your dataclasses.
src/atomate2/qchem/flows/core.py
Outdated
opt_maker: BaseQChemMaker = field(default_factory=OptMaker) | ||
freq_maker: BaseQChemMaker = field(default_factory=FreqMaker) | ||
|
||
def make(self, molecule: Molecule, prev_qchem_dir: str or Path or None = None, mode=mode, scale=scale): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't specify types like that in type hinting. You need to use the |
character. You also can't use mode
or scale
since they are dataclass variables. Instead you should remove these parameters and use self.mode
in the function.
def make(self, molecule: Molecule, prev_qchem_dir: str or Path or None = None, mode=mode, scale=scale): | |
def make(self, molecule: Molecule, prev_qchem_dir: str | Path | None = None): |
src/atomate2/qchem/jobs/base.py
Outdated
# "normalmode_eigenvecs", | ||
# ] | ||
|
||
_DATA_OBJECTS = [QCInput] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DATA_OBJECTS controls what objects will get put into the jobflow "data" store which is used for large objects. I don't think QCInput should be in here as:
- QCInputs won't be getting stored after a calculations.
- They aren't large!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This field would be for large data files, for example if we write out a CUBE file of the electron density or something like that.
src/atomate2/qchem/sets/base.py
Outdated
scf_algorithm = self._get_scf_algorithm(scf_algorithm_updates) | ||
|
||
|
||
return QChemDictSet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not do this. It will make it very challenging to diagnose why certain things are getting written in the input sets. You will have to first check QChemInputGenerator
, then check QChemDictSet
and then finally check QCInput
.
We should only have the generator QChemInputGenerator
and the input set QCInput
. The long term goal is to use only the modern input sets and remove the old style input sets. This won't be possible if the new input sets rely on the old ones.
So please can you refactor QChemDictSet
into QChemInputGenerator
. It does not look like it will take that long as the QChem inputs are a lot simpler than the VASP input sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@utf I completely understand the sentiment and agree that refactoring QChemDictSet
into a modern InputSet
is the long term goal. When I looked into doing that before though, it did not seem straightforward. So I had suggested to Rishabh that as long as the legacy QChemDictSet
behaves the same way as a modern InputSet
(e.g., has a write_input
method with the correct call signature), a sensible incremental solution would be to use an InputGenerator
to create a legacy InputSet
.
In general, I feel that the QChem InputSet
do a lot less overrriding of input kwargs than the VASP ones so I don't think there's as high a risk of this becoming a confusing mess. @rdguha1995 it would be good to see if you can skip over the DictSet
layer and directly create QCInput
here, which would reduce the risk of a kwarg getting overridden somewhere. But I'm not sure how well that will work because I think the existing QCInput
is somewhere in between the functionality of a modern InputSet
and InputFile
.
Aside from the risk of confusing overrides, I don't believe taking this approach adds any extra dependencies or complexifies the code (but please correct me if I'm wrong). If that's the case, I'd propose that we use this paradigm to get to beta quality atomate2 support for Q-Chem and tackle the InputSet
refactor a bit down the line, just to minimize the number of moving parts.
I've started a discussion about this on emmet: materialsproject/emmet#517 For this PR, I think it makes sense to use the existing task documents from emmet, but you will need to add new functions to initialise the task documents from the raw QChem files (e.g., see what I did in the VASP module in atomate2). When you add these functions in emmet, please can you mention me in the PR so I can review there too. |
@utf , thanks for starting this thread on emmet. I will try to make a separate PR in emmet and assign you as a reviewer. In the meantime, would it be helpful if I am also tagged in materialsproject/emmet#517 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this @rdguha1995! This will be a great service to use Molecular DFT folks. See some high-level review comments below. I can review in greater detail as this gets more refined.
src/atomate2/qchem/jobs/base.py
Outdated
# "normalmode_eigenvecs", | ||
# ] | ||
|
||
_DATA_OBJECTS = [QCInput] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This field would be for large data files, for example if we write out a CUBE file of the electron density or something like that.
src/atomate2/qchem/sets/core.py
Outdated
@dataclass | ||
class SinglePointSetGenerator(QChemInputGenerator): | ||
"""Class to generate QChem Single Point input sets.""" | ||
|
||
def get_basis_set_updates( | ||
self, | ||
molecule: Molecule, | ||
prev_basis: str = None, | ||
prev_scf: str = None, | ||
new_geom_opt: dict = None, | ||
nbo_params: dict = None, | ||
overwrite_inputs: dict = None, | ||
) -> dict: | ||
""" | ||
Get updates to the basis set for a single point calculation. | ||
""" | ||
return {"basis_set": "def2-tzvppd", "scf_algorithm": "diis", "job_type": "sp"} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each of the QChemInputGenerator
is implemented with a get_basis_set_updates
method that returns a pre-populated dict. Wouldnt it be simpler to just do this directly, e.g.
class SinglePointSetGenerator(QChemInputGenerator):
"""Class to generate QChem Single Point input sets."""
return QChemInputGenerator("basis_set"="def2-tzvppd",
"scf_algorithm"="diis",
"job_type": "sp")
'''
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the simplest form, I had not even envisioned having any inheritance in the InputSetGenerator
. E.g. the only function of the Generator would be to define specific sets of kwargs that get passed to InputSet
class OptSetGenerator(InputGenerator):
def __init__(basis_set, scf_algorithm, dft_rung, etc.)
....
def get_input_set(Molecule):
return `QChemInputSet(molecule, basis_set, scf_algorithm, dft_rung).
There may be some technical advantages to have the intermediate class QChemInputGenerator
, especially with regard to inheriting previous calc directories. But if we can possibly do without it, I'd advocate for the approach above. Keep things simple!
@utf I know VASP makes use of the dual layer model and I think that's what @rdguha1995 is emulating here. Do you have any additional thoughts on when/why that's advantageous?
src/atomate2/qchem/sets/base.py
Outdated
|
||
basis_set_updates = self.get_basis_set_updates( | ||
molecule, | ||
prev_basis=prev_basis, | ||
prev_scf=prev_scf, | ||
new_geom_opt=new_geom_opt, | ||
overwrite_inputs=overwrite_inputs, | ||
nbo_params=nbo_params, | ||
) | ||
|
||
scf_algorithm_updates = self.get_scf_algorithm_updates( | ||
molecule, | ||
prev_basis=prev_basis, | ||
prev_scf=prev_scf, | ||
new_geom_opt=new_geom_opt, | ||
overwrite_inputs=overwrite_inputs, | ||
nbo_params=nbo_params, | ||
) | ||
|
||
basis_set = self._basis_set(basis_set_updates) | ||
scf_algorithm = self._get_scf_algorithm(scf_algorithm_updates) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I envisioned implementing an InputGenerator
is that any of the core calculation settings, such as basis_set and scf_algorithm, are set as class attributes on instantiation, e.g.
QChemInputGenerator(basis_set = 'def2-svpd')
etc.
Basically, any argument that needs to be passed to InputSet
_other thanthe strutcure. Then_ the
get_input_setpasses the class attrs, plus the
Molecule`, to the input set init method.
I could be missing something, but for me, having all these extra methods makes the code harder to understand, when really their only purpose is to set some kwargs in the InputSet constructor.
src/atomate2/qchem/sets/base.py
Outdated
scf_algorithm = self._get_scf_algorithm(scf_algorithm_updates) | ||
|
||
|
||
return QChemDictSet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@utf I completely understand the sentiment and agree that refactoring QChemDictSet
into a modern InputSet
is the long term goal. When I looked into doing that before though, it did not seem straightforward. So I had suggested to Rishabh that as long as the legacy QChemDictSet
behaves the same way as a modern InputSet
(e.g., has a write_input
method with the correct call signature), a sensible incremental solution would be to use an InputGenerator
to create a legacy InputSet
.
In general, I feel that the QChem InputSet
do a lot less overrriding of input kwargs than the VASP ones so I don't think there's as high a risk of this becoming a confusing mess. @rdguha1995 it would be good to see if you can skip over the DictSet
layer and directly create QCInput
here, which would reduce the risk of a kwarg getting overridden somewhere. But I'm not sure how well that will work because I think the existing QCInput
is somewhere in between the functionality of a modern InputSet
and InputFile
.
Aside from the risk of confusing overrides, I don't believe taking this approach adds any extra dependencies or complexifies the code (but please correct me if I'm wrong). If that's the case, I'd propose that we use this paradigm to get to beta quality atomate2 support for Q-Chem and tackle the InputSet
refactor a bit down the line, just to minimize the number of moving parts.
Hi @rkingsbury, I think we should chat properly about the input sets as I'm very reluctant to merge something that is designed to be a temporary fix - in my experience it will never get fixed and we'll end up with a code base that is a collection of confusing and mixed standards. The new approach to input sets in atomate2 actually makes things a lot more transparent from a user point of view if implemented correctly. The current input sets tackle two issues that got very messy with the old input set style:
This also has the benefit that the updates that define an input set are defined in a single function (not a mixture of class variables, init methods, from_prev, and other functions). If you scroll through the VASP input sets or even the QChem input sets in this PR, it is very easy to discern the settings that define the calculation. If you look through the current input sets in pymatgen then it is much less obvious (and indeed there are sometimes inconsistencies between the different parts of the input set). That said, I don't think the current implementation of the base input set and override functions in this PR is necessarily right. It looks like we can to simplify things a lot. Especially if we refactor the old dict set into the new generator. At our next meeting, we can discuss this in more detail to work out what the requirements of the input sets are, how they will be used, and the simplest way to implement them. I'm confident that this will result in a better user experience from the outset. And I also don't think it will take that much engineering effort. |
Hey @utf , and @rkingsbury . This has been a long time coming and I am sorry for not being more prompt on this. I made a bunch of changes to this PR in October to refactor the I think this PR is a step closer to the infrastructure @utf envisions. There definitely might be more work to do but it would be great if a review can be performed on this PR. @utf I think that might be under your purview. Thanks team for all the help. I have a feeling that if we give this one solid push, we would be much closer to incorporating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay with this @rdguha1995 @rkingsbury.
This is looking great and exactly along the lines of atomate2!
I added some minor comments. The main remaining things are:
- Finishing the inputset implementation.
- Finishing the schema implementation.
- Adding some tests and wider documentation.
Happy to help discuss these points further (especially 3) if that would be helpful.
src/atomate2/qchem/jobs/base.py
Outdated
|
||
from atomate2.qchem.files import copy_qchem_outputs, write_qchem_input_set | ||
from atomate2.qchem.run import run_qchem, should_stop_children | ||
from atomate2.qchem.schemas.task import TaskDocument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this already in emmet? if so you can now use that one provided any atomate2 changes are moved across.
src/atomate2/qchem/schemas/task.py
Outdated
warnings: Dict[str, bool] = Field( | ||
None, description="Any warnings related to this task document" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to add a from_directory
function to this task document/the one in emmet.
QChem job type to run. Valid options are "opt" for optimization, | ||
"sp" for single point, "freq" for frequency calculation, or "force" for | ||
force evaluation. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments.
- Can you remove the spaces after each parameter.
- The way you specify the type in numpy style docstrings is like
job_type : str
rather than brackets. - The description should be on the next line down and indented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
src/atomate2/qchem/sets/base.py
Outdated
custom_smd (str): List of parameters to define a custom solvent in SMD. (Default: None) | ||
Must be given as a string of seven comma separated values in the following order: | ||
"dielectric, refractive index, acidity, basicity, surface tension, aromaticity, | ||
electronegative halogenicity" | ||
Refer to the QChem manual for further details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g.
custom_smd (str): List of parameters to define a custom solvent in SMD. (Default: None) | |
Must be given as a string of seven comma separated values in the following order: | |
"dielectric, refractive index, acidity, basicity, surface tension, aromaticity, | |
electronegative halogenicity" | |
Refer to the QChem manual for further details. | |
custom_smd : str | |
List of parameters to define a custom solvent in SMD. (Default: None) | |
Must be given as a string of seven comma separated values in the following order: | |
"dielectric, refractive index, acidity, basicity, surface tension, aromaticity, | |
electronegative halogenicity" | |
Refer to the QChem manual for further details. |
src/atomate2/qchem/sets/base.py
Outdated
# ) | ||
|
||
# basis_set = self._basis_set(basis_set_updates) | ||
# scf_algorithm = self._get_scf_algorithm(scf_algorithm_updates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder that from previous and get_input_set_updates
are not implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussions with @rkingsbury referenced in core.py
, this whole paradigm has been moved out for now. @utf , would like your opinion on this.
…e parameter descriptions
Happy New Year @utf ! Just wanted to bring you up to speed with this. I finished integrating tests and other remaining components for this PR over the break. It is pretty much ready to merge now. I think some tests relating to the ML mace workflows are failing which is preventing the tests to complete. Let me know if I can do anything else from my end. Thanks for all the help. I will keep thinking about more functionalities to add to this in the future (i.e. possibilities of some flows.). But I think this can be a good starting point for qchem users on atomate2. Let me know what you think Also tagging @Zhuoying on this! |
@@ -185,7 +185,34 @@ class Atomate2Settings(BaseSettings): | |||
None, description="Additional settings applied to AMSET settings file." | |||
) | |||
|
|||
model_config = SettingsConfigDict(env_prefix="atomate2_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have been removed?
src/atomate2/settings.py
Outdated
"parsing QChem directories useful for storing duplicate of FW.json", | ||
) | ||
|
||
class Config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was the old pydantic style config, and can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rdguha1995, thanks for these changes. It seems like the testing and linting are failing unrelated to the mace tests.
tests/conftest.py
Outdated
@@ -44,7 +46,8 @@ def clean_dir(debug_mode): | |||
os.chdir(new_path) | |||
yield | |||
if debug_mode: | |||
print(f"Tests ran in {new_path}") # noqa: T201 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert this change?
Thanks for your painstaking work on this @rdguha1995 ! Your timing is perfect - one of my students is just about in a position to "graduate" from manual Q-Chem calculations to atomate/atomate2 and I'll encourage him to test this out. I see that there is a code stub for Frequency Flattening, but if I understand correctly, that |
Thanks @utf for all the pointers. I resolved all the standing issues with my current code. The only thing left now is increasing my test coverage. I think adding a few tests for @rkingsbury thanks! It was a whirlwind, but a learnt a lot here. Others trying this out would be ideal. I almost have a tunnel vision on this code now, so fresh perspectives would be great. I am sure there are things we can improve upon. In terms of the flows, yes I have tabled that code for now, primarily because of the dynamic nature of the flows in |
This is a HUGE service to the MP ecosystem 🙏🏻 . I know it's been a slog. Have you been in touch with Sam about your work on this? I'm sure he'll be interested in testing as soon as it's ready, too. Re: flows - all makes sense - I don't know either, but I think dynamic flows should be relatively straightforward to implement due to the prevalence of generators in |
Since @utf has reviewed its latest version, which has passed most of the tests, I would like to merge it as it is. |
Sorry @Zhuoying, I reverted the merge. A lot has changed since I last looked at this PR in detail. My last comments were just high level things. @rdguha1995 Please can you submit a new PR and I will have a detailed look over in the next few days. |
@utf Sure, no problem! |
Summary
This is a draft attempt at porting Qchem into Atomate2 :
I think this is a good start, but there are surely some missing pieces we need to figure out.