Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] First pass at porting QChem into atomate2, waiting on concurrent PR approvals #161

Merged
merged 67 commits into from
Jan 24, 2024

Conversation

rdguha1995
Copy link
Contributor

Summary

This is a draft attempt at porting Qchem into Atomate2 :

  • I have closely tried to mimic the same layout and structure as VASP.
  • We have 4 folders of interest - sets, schemas, jobs and flows
  • Sets
  • base.py in sets creates a QChemInputGenerator which is basically wrapping around the QChemDictSet from pymatgen
  • core.py in sets creates the input sets for different QChem jobs after inheriting the base generator.
  • Jobs
  • base.py in jobs creates a BaseQChemMaker for running any type of Qchem-Atomate2 job. Note that the base.py in jobs also inherits files.py and run.py
  • core.py inherits the BaseQChemMaker and creates jobs for the different Qchem-Atomate2 calculations
  • Flows (WIP)
  • This still requires work. From what I have understood, flows are used when you want to combine two non-io firetasks. Drawing on that parallel, the two current Qchem-Atomate fireworks which fits the bill is FFOpts and FFTS. I have created a skeletal flow for that but I am not sure whether we even need to have flows as a whole in Qchem-Atomate2. Would defer to @rkingsbury .
  • Schemas (TODO)
  • Still need to figure out how we tackle the Qchem TaskDocument. Should we just inherit it from emmet? Again @rkingsbury would be the best person to answer that question.

I think this is a good start, but there are surely some missing pieces we need to figure out.

@codecov
Copy link

codecov bot commented Aug 29, 2022

Codecov Report

Attention: 153 lines in your changes are missing coverage. Please review.

Comparison is base (f8d6901) 76.10% compared to head (569c03a) 75.34%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #161      +/-   ##
==========================================
- Coverage   76.10%   75.34%   -0.77%     
==========================================
  Files          87       94       +7     
  Lines        7151     7551     +400     
  Branches     1057     1146      +89     
==========================================
+ Hits         5442     5689     +247     
- Misses       1386     1509     +123     
- Partials      323      353      +30     
Files Coverage Δ
src/atomate2/qchem/sets/core.py 100.00% <100.00%> (ø)
src/atomate2/settings.py 100.00% <100.00%> (ø)
src/atomate2/qchem/jobs/core.py 94.11% <94.11%> (ø)
src/atomate2/qchem/jobs/base.py 81.39% <81.39%> (ø)
src/atomate2/qchem/files.py 78.04% <78.04%> (ø)
src/atomate2/qchem/drones.py 55.17% <55.17%> (ø)
src/atomate2/qchem/run.py 48.71% <48.71%> (ø)
src/atomate2/qchem/sets/base.py 43.57% <43.57%> (ø)

Copy link
Member

@utf utf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rdguha1995,

This looks great! Thanks very much.

My main comment is on the input sets. I think it is a very bad idea wrapping around the old style ones in pymatgen. I've written more detail in the review comments. It will definitely be worthwhile to refactor the old dict set into atomate2. I'm happy to help if you run into trouble but I think it should be quick.

On a separate note, you should install the pre-commit so that your commits are automatically formatted in line with the atomate2 style. To do that you can run the following in the root atomate2 folder:

pip install pre-commit
pre-commit install

Then, any time you make changes the formatting tools will be applied. If changes are made, you will have to add them to the commit for the changes to be included. To run pre-commit on all files you can use:

pre-commit run --all

Happy to chat on zoom if you have any questions.

"""

name: str = "frequency flattening structure optimization"
max_iterations=10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For dataclasses you need to specify the type. E.g., this should be

Suggested change
max_iterations=10
max_iterations: int = 10

Otherwise the dataclass won't work properly.

Please double check all your dataclasses.

opt_maker: BaseQChemMaker = field(default_factory=OptMaker)
freq_maker: BaseQChemMaker = field(default_factory=FreqMaker)

def make(self, molecule: Molecule, prev_qchem_dir: str or Path or None = None, mode=mode, scale=scale):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't specify types like that in type hinting. You need to use the | character. You also can't use mode or scale since they are dataclass variables. Instead you should remove these parameters and use self.mode in the function.

Suggested change
def make(self, molecule: Molecule, prev_qchem_dir: str or Path or None = None, mode=mode, scale=scale):
def make(self, molecule: Molecule, prev_qchem_dir: str | Path | None = None):

# "normalmode_eigenvecs",
# ]

_DATA_OBJECTS = [QCInput]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DATA_OBJECTS controls what objects will get put into the jobflow "data" store which is used for large objects. I don't think QCInput should be in here as:

  1. QCInputs won't be getting stored after a calculations.
  2. They aren't large!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This field would be for large data files, for example if we write out a CUBE file of the electron density or something like that.

scf_algorithm = self._get_scf_algorithm(scf_algorithm_updates)


return QChemDictSet(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not do this. It will make it very challenging to diagnose why certain things are getting written in the input sets. You will have to first check QChemInputGenerator, then check QChemDictSet and then finally check QCInput.

We should only have the generator QChemInputGenerator and the input set QCInput. The long term goal is to use only the modern input sets and remove the old style input sets. This won't be possible if the new input sets rely on the old ones.

So please can you refactor QChemDictSet into QChemInputGenerator. It does not look like it will take that long as the QChem inputs are a lot simpler than the VASP input sets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utf I completely understand the sentiment and agree that refactoring QChemDictSet into a modern InputSet is the long term goal. When I looked into doing that before though, it did not seem straightforward. So I had suggested to Rishabh that as long as the legacy QChemDictSet behaves the same way as a modern InputSet (e.g., has a write_input method with the correct call signature), a sensible incremental solution would be to use an InputGenerator to create a legacy InputSet.

In general, I feel that the QChem InputSet do a lot less overrriding of input kwargs than the VASP ones so I don't think there's as high a risk of this becoming a confusing mess. @rdguha1995 it would be good to see if you can skip over the DictSet layer and directly create QCInput here, which would reduce the risk of a kwarg getting overridden somewhere. But I'm not sure how well that will work because I think the existing QCInput is somewhere in between the functionality of a modern InputSet and InputFile.

Aside from the risk of confusing overrides, I don't believe taking this approach adds any extra dependencies or complexifies the code (but please correct me if I'm wrong). If that's the case, I'd propose that we use this paradigm to get to beta quality atomate2 support for Q-Chem and tackle the InputSet refactor a bit down the line, just to minimize the number of moving parts.

@utf
Copy link
Member

utf commented Aug 29, 2022

Still need to figure out how we tackle the Qchem TaskDocument. Should we just inherit it from emmet?

I've started a discussion about this on emmet: materialsproject/emmet#517

For this PR, I think it makes sense to use the existing task documents from emmet, but you will need to add new functions to initialise the task documents from the raw QChem files (e.g., see what I did in the VASP module in atomate2). When you add these functions in emmet, please can you mention me in the PR so I can review there too.

@rdguha1995
Copy link
Contributor Author

@utf , thanks for starting this thread on emmet. I will try to make a separate PR in emmet and assign you as a reviewer. In the meantime, would it be helpful if I am also tagged in materialsproject/emmet#517 ?

Copy link
Contributor

@rkingsbury rkingsbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this @rdguha1995! This will be a great service to use Molecular DFT folks. See some high-level review comments below. I can review in greater detail as this gets more refined.

src/atomate2/qchem/files.py Outdated Show resolved Hide resolved
# "normalmode_eigenvecs",
# ]

_DATA_OBJECTS = [QCInput]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This field would be for large data files, for example if we write out a CUBE file of the electron density or something like that.

Comment on lines 26 to 43
@dataclass
class SinglePointSetGenerator(QChemInputGenerator):
"""Class to generate QChem Single Point input sets."""

def get_basis_set_updates(
self,
molecule: Molecule,
prev_basis: str = None,
prev_scf: str = None,
new_geom_opt: dict = None,
nbo_params: dict = None,
overwrite_inputs: dict = None,
) -> dict:
"""
Get updates to the basis set for a single point calculation.
"""
return {"basis_set": "def2-tzvppd", "scf_algorithm": "diis", "job_type": "sp"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of the QChemInputGenerator is implemented with a get_basis_set_updates method that returns a pre-populated dict. Wouldnt it be simpler to just do this directly, e.g.

class SinglePointSetGenerator(QChemInputGenerator):
    """Class to generate QChem Single Point input sets."""
    return QChemInputGenerator("basis_set"="def2-tzvppd",
                                                     "scf_algorithm"="diis",
                                                     "job_type": "sp")
'''

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the simplest form, I had not even envisioned having any inheritance in the InputSetGenerator. E.g. the only function of the Generator would be to define specific sets of kwargs that get passed to InputSet

class OptSetGenerator(InputGenerator):

     def __init__(basis_set, scf_algorithm, dft_rung, etc.)
           ....

     def get_input_set(Molecule):

           return `QChemInputSet(molecule, basis_set, scf_algorithm, dft_rung).

There may be some technical advantages to have the intermediate class QChemInputGenerator, especially with regard to inheriting previous calc directories. But if we can possibly do without it, I'd advocate for the approach above. Keep things simple!

@utf I know VASP makes use of the dual layer model and I think that's what @rdguha1995 is emulating here. Do you have any additional thoughts on when/why that's advantageous?

Comment on lines 62 to 84

basis_set_updates = self.get_basis_set_updates(
molecule,
prev_basis=prev_basis,
prev_scf=prev_scf,
new_geom_opt=new_geom_opt,
overwrite_inputs=overwrite_inputs,
nbo_params=nbo_params,
)

scf_algorithm_updates = self.get_scf_algorithm_updates(
molecule,
prev_basis=prev_basis,
prev_scf=prev_scf,
new_geom_opt=new_geom_opt,
overwrite_inputs=overwrite_inputs,
nbo_params=nbo_params,
)

basis_set = self._basis_set(basis_set_updates)
scf_algorithm = self._get_scf_algorithm(scf_algorithm_updates)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I envisioned implementing an InputGenerator is that any of the core calculation settings, such as basis_set and scf_algorithm, are set as class attributes on instantiation, e.g.

QChemInputGenerator(basis_set = 'def2-svpd') etc.

Basically, any argument that needs to be passed to InputSet _other thanthe strutcure. Then_ theget_input_setpasses the class attrs, plus theMolecule`, to the input set init method.

I could be missing something, but for me, having all these extra methods makes the code harder to understand, when really their only purpose is to set some kwargs in the InputSet constructor.

scf_algorithm = self._get_scf_algorithm(scf_algorithm_updates)


return QChemDictSet(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utf I completely understand the sentiment and agree that refactoring QChemDictSet into a modern InputSet is the long term goal. When I looked into doing that before though, it did not seem straightforward. So I had suggested to Rishabh that as long as the legacy QChemDictSet behaves the same way as a modern InputSet (e.g., has a write_input method with the correct call signature), a sensible incremental solution would be to use an InputGenerator to create a legacy InputSet.

In general, I feel that the QChem InputSet do a lot less overrriding of input kwargs than the VASP ones so I don't think there's as high a risk of this becoming a confusing mess. @rdguha1995 it would be good to see if you can skip over the DictSet layer and directly create QCInput here, which would reduce the risk of a kwarg getting overridden somewhere. But I'm not sure how well that will work because I think the existing QCInput is somewhere in between the functionality of a modern InputSet and InputFile.

Aside from the risk of confusing overrides, I don't believe taking this approach adds any extra dependencies or complexifies the code (but please correct me if I'm wrong). If that's the case, I'd propose that we use this paradigm to get to beta quality atomate2 support for Q-Chem and tackle the InputSet refactor a bit down the line, just to minimize the number of moving parts.

@utf
Copy link
Member

utf commented Aug 30, 2022

Hi @rkingsbury, I think we should chat properly about the input sets as I'm very reluctant to merge something that is designed to be a temporary fix - in my experience it will never get fixed and we'll end up with a code base that is a collection of confusing and mixed standards.

The new approach to input sets in atomate2 actually makes things a lot more transparent from a user point of view if implemented correctly. The current input sets tackle two issues that got very messy with the old input set style:

  1. You typically have a set of base settings for a calculation. However, these are not fixed in stone and one could conceivably change them and it wouldn't affect the type of calculation being performed. Think energy cut-off or basis set used. There is also a a second set of settings that define the calculation being performed and these must be set to achieve the desired result. E.g., think number of ionic steps for static vs relaxation calculations. This distinction becomes very important when starting from previous calculations. In particular, you want to ignore the "base settings" and only apply the settings that define this type of calculation.
  2. Often the "update" settings (i.e., the ones that define the calculation type) depend on the actual molecule or previous output. E.g., they are dynamic and can't be defined by just a static dictionary. Therefore, this setup is a very flexible way to allow customisable input set updates. In particular, you only have to define a single function that can be used for both fresh and from previous calculations!

This also has the benefit that the updates that define an input set are defined in a single function (not a mixture of class variables, init methods, from_prev, and other functions). If you scroll through the VASP input sets or even the QChem input sets in this PR, it is very easy to discern the settings that define the calculation. If you look through the current input sets in pymatgen then it is much less obvious (and indeed there are sometimes inconsistencies between the different parts of the input set).

That said, I don't think the current implementation of the base input set and override functions in this PR is necessarily right. It looks like we can to simplify things a lot. Especially if we refactor the old dict set into the new generator.

At our next meeting, we can discuss this in more detail to work out what the requirements of the input sets are, how they will be used, and the simplest way to implement them. I'm confident that this will result in a better user experience from the outset. And I also don't think it will take that much engineering effort.

@rdguha1995
Copy link
Contributor Author

rdguha1995 commented Jan 20, 2023

Hey @utf , and @rkingsbury . This has been a long time coming and I am sorry for not being more prompt on this. I made a bunch of changes to this PR in October to refactor the QCInput and QChemDictSet into QChemInputSet and QChemInputGenerator.

I think this PR is a step closer to the infrastructure @utf envisions. There definitely might be more work to do but it would be great if a review can be performed on this PR. @utf I think that might be under your purview.

Thanks team for all the help. I have a feeling that if we give this one solid push, we would be much closer to incorporating QChem into atomate2

Copy link
Member

@utf utf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay with this @rdguha1995 @rkingsbury.

This is looking great and exactly along the lines of atomate2!

I added some minor comments. The main remaining things are:

  1. Finishing the inputset implementation.
  2. Finishing the schema implementation.
  3. Adding some tests and wider documentation.

Happy to help discuss these points further (especially 3) if that would be helpful.


from atomate2.qchem.files import copy_qchem_outputs, write_qchem_input_set
from atomate2.qchem.run import run_qchem, should_stop_children
from atomate2.qchem.schemas.task import TaskDocument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this already in emmet? if so you can now use that one provided any atomate2 changes are moved across.

warnings: Dict[str, bool] = Field(
None, description="Any warnings related to this task document"
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to add a from_directory function to this task document/the one in emmet.

QChem job type to run. Valid options are "opt" for optimization,
"sp" for single point, "freq" for frequency calculation, or "force" for
force evaluation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments.

  1. Can you remove the spaces after each parameter.
  2. The way you specify the type in numpy style docstrings is like job_type : str rather than brackets.
  3. The description should be on the next line down and indented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 155 to 159
custom_smd (str): List of parameters to define a custom solvent in SMD. (Default: None)
Must be given as a string of seven comma separated values in the following order:
"dielectric, refractive index, acidity, basicity, surface tension, aromaticity,
electronegative halogenicity"
Refer to the QChem manual for further details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g.

Suggested change
custom_smd (str): List of parameters to define a custom solvent in SMD. (Default: None)
Must be given as a string of seven comma separated values in the following order:
"dielectric, refractive index, acidity, basicity, surface tension, aromaticity,
electronegative halogenicity"
Refer to the QChem manual for further details.
custom_smd : str
List of parameters to define a custom solvent in SMD. (Default: None)
Must be given as a string of seven comma separated values in the following order:
"dielectric, refractive index, acidity, basicity, surface tension, aromaticity,
electronegative halogenicity"
Refer to the QChem manual for further details.

# )

# basis_set = self._basis_set(basis_set_updates)
# scf_algorithm = self._get_scf_algorithm(scf_algorithm_updates)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder that from previous and get_input_set_updates are not implemented.

Copy link
Contributor Author

@rdguha1995 rdguha1995 Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussions with @rkingsbury referenced in core.py, this whole paradigm has been moved out for now. @utf , would like your opinion on this.

@rdguha1995
Copy link
Contributor Author

Happy New Year @utf !

Just wanted to bring you up to speed with this. I finished integrating tests and other remaining components for this PR over the break. It is pretty much ready to merge now. I think some tests relating to the ML mace workflows are failing which is preventing the tests to complete. Let me know if I can do anything else from my end. Thanks for all the help. I will keep thinking about more functionalities to add to this in the future (i.e. possibilities of some flows.). But I think this can be a good starting point for qchem users on atomate2.

Let me know what you think

Also tagging @Zhuoying on this!

@@ -185,7 +185,34 @@ class Atomate2Settings(BaseSettings):
None, description="Additional settings applied to AMSET settings file."
)

model_config = SettingsConfigDict(env_prefix="atomate2_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have been removed?

"parsing QChem directories useful for storing duplicate of FW.json",
)

class Config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was the old pydantic style config, and can be removed.

Copy link
Member

@utf utf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rdguha1995, thanks for these changes. It seems like the testing and linting are failing unrelated to the mace tests.

@@ -44,7 +46,8 @@ def clean_dir(debug_mode):
os.chdir(new_path)
yield
if debug_mode:
print(f"Tests ran in {new_path}") # noqa: T201
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert this change?

@rkingsbury
Copy link
Contributor

Thanks for your painstaking work on this @rdguha1995 ! Your timing is perfect - one of my students is just about in a position to "graduate" from manual Q-Chem calculations to atomate/atomate2 and I'll encourage him to test this out.

I see that there is a code stub for Frequency Flattening, but if I understand correctly, that Flow is meant to be implemented in future work, correct?

@rdguha1995
Copy link
Contributor Author

Thanks @utf for all the pointers.

I resolved all the standing issues with my current code. The only thing left now is increasing my test coverage. I think adding a few tests for drones and sets will do the trick! Please let me know if you have any suggestions.

@rkingsbury thanks! It was a whirlwind, but a learnt a lot here. Others trying this out would be ideal. I almost have a tunnel vision on this code now, so fresh perspectives would be great. I am sure there are things we can improve upon. In terms of the flows, yes I have tabled that code for now, primarily because of the dynamic nature of the flows in qchem. For instance in FFOpts, we don't know beforehand how many iterations of Freqs and Opts we would need before starting the calculation. I think there are functionalities in jobflow which accounts for dynamic flows but I have not looked into them yet. We can have a discussion about that soon. I would eventually want to port over FFOpt to atomate2 as well

@rkingsbury
Copy link
Contributor

@rkingsbury thanks! It was a whirlwind, but a learnt a lot here. Others trying this out would be ideal. I almost have a tunnel vision on this code now, so fresh perspectives would be great. I am sure there are things we can improve upon. In terms of the flows, yes I have tabled that code for now, primarily because of the dynamic nature of the flows in qchem. For instance in FFOpts, we don't know beforehand how many iterations of Freqs and Opts we would need before starting the calculation. I think there are functionalities in jobflow which accounts for dynamic flows but I have not looked into them yet. We can have a discussion about that soon. I would eventually want to port over FFOpt to atomate2 as well

This is a HUGE service to the MP ecosystem 🙏🏻 . I know it's been a slog. Have you been in touch with Sam about your work on this? I'm sure he'll be interested in testing as soon as it's ready, too.

Re: flows - all makes sense - I don't know either, but I think dynamic flows should be relatively straightforward to implement due to the prevalence of generators in atomate2 code - we would basically just have to yield new Opt and Freq jobs over and over again based on a stop condition. Anyway, we can discuss more about that later on (in another PR).

@Zhuoying
Copy link
Contributor

Since @utf has reviewed its latest version, which has passed most of the tests, I would like to merge it as it is.
@rdguha1995 Thanks for the great PR! Very amazing job!
Please feel free to submit a new PR to increase the test coverage anytime when it is ready.

@Zhuoying Zhuoying merged commit 8cd152f into materialsproject:main Jan 24, 2024
6 of 7 checks passed
@utf
Copy link
Member

utf commented Jan 24, 2024

Sorry @Zhuoying, I reverted the merge.

A lot has changed since I last looked at this PR in detail. My last comments were just high level things.

@rdguha1995 Please can you submit a new PR and I will have a detailed look over in the next few days.

@Zhuoying
Copy link
Contributor

@utf Sure, no problem!
@rdguha1995 We can use a new PR for more discussions!

This was referenced Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants