Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Energy forces #278

Merged
merged 37 commits into from
Sep 20, 2024
Merged

Energy forces #278

merged 37 commits into from
Sep 20, 2024

Conversation

RylieWeaver
Copy link
Collaborator

@allaffa Let me know what you think of this PR for energy and force computation. It should follow exactly what we discussed before. Some notable things:

(1) It requires data.energy and data.forces to exist and be named this way
(2) It balances force and energy prediction weighting implicitly, since we aren't able to pass multiple weights in single-task learning
(3) Compute forces is an argument to be specified in the json within "Training".
- If specified, it throws errors for anything other than nodal single task learning, as well as any model that doesn't
use positional information for prediction

@RylieWeaver RylieWeaver added the enhancement New feature or request label Sep 6, 2024
@RylieWeaver RylieWeaver self-assigned this Sep 6, 2024
Copy link
Collaborator

@allaffa allaffa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RylieWeaver
This is exactly what we discussed and looks awesome.
Have you tried it with Lenndard Jones to make sure that it is compatible?

@RylieWeaver
Copy link
Collaborator Author

@allaffa

Awesome, thanks.

I was able to get it to work with Lennard Jones after some small changes to LJ's dataset creation and inference. For example, we now have data.energy and'data.forces instead of data.y and data.forces_pre_scaled. A next step could be updating the LennardJones example to that.

Do not merge it yet, because I am resolving any issues with the GitHub tests, but will let you know when they are passed.

@allaffa
Copy link
Collaborator

allaffa commented Sep 6, 2024

@allaffa

Awesome, thanks.

I was able to get it to work with Lennard Jones after some small changes to LJ's dataset creation and inference. For example, we now have data.energy and'data.forces instead of data.y and data.forces_pre_scaled. A next step could be updating the LennardJones example to that.

Do not merge it yet, because I am resolving any issues with the GitHub tests, but will let you know when they are passed.

@RylieWeaver
As long as the CI tests do not pass, I would not be allowed to merge this PR anyway. We have explicitly forbidden this option in the settings of the GitHub repository.

@allaffa
Copy link
Collaborator

allaffa commented Sep 6, 2024

@RylieWeaver

I think it would be good to also include a mew test that the CI workflow must execute where we check this capability to make sure that future changes in the code do not break it

@RylieWeaver
Copy link
Collaborator Author

@RylieWeaver As long as the CI tests do not pass, I would not be allowed to merge this PR anyway. We have explicitly forbidden this option in the settings of the GitHub repository.

Ok, that's good.

@allaffa allaffa self-requested a review September 6, 2024 22:27
@RylieWeaver
Copy link
Collaborator Author

RylieWeaver commented Sep 6, 2024

@RylieWeaver

I think it would be good to also include a mew test that the CI workflow must execute where we check this capability to make sure that future changes in the code do not break it

Ok, I can include this. Theoretically, any model that includes positional information should be able to predict forces in this method. Should we use that list for the testing?

@RylieWeaver
Copy link
Collaborator Author

@allaffa I believe I'll need to create a new dataset inside hydragnn/serialized_dataset for this testing, so that data.forces and data.energy are available. It seems like most of the other data is generated randomly from [0.0, 0.5, 1.0].

Is this correct and is random generation is fine for the forces/energy as well?

@allaffa
Copy link
Collaborator

allaffa commented Sep 9, 2024

@allaffa I believe I'll need to create a new dataset inside hydragnn/serialized_dataset for this testing, so that data.forces and data.energy are available. It seems like most of the other data is generated randomly from [0.0, 0.5, 1.0].

Is this correct and is random generation is fine for the forces/energy as well?

@RylieWeaver
I have an idea that I would like to run by you.
What about adding a frozen version of the LJ code as an additional example, where the example first generates the data, saves it into files, and then uses it for training? We don't need to generate many data samples and traing it fully. We just need to make sure that the training kicks off correctly without errors.

@RylieWeaver
Copy link
Collaborator Author

RylieWeaver commented Sep 9, 2024

@allaffa I believe I'll need to create a new dataset inside hydragnn/serialized_dataset for this testing, so that data.forces and data.energy are available. It seems like most of the other data is generated randomly from [0.0, 0.5, 1.0].
Is this correct and is random generation is fine for the forces/energy as well?

@RylieWeaver I have an idea that I would like to run by you. What about adding a frozen version of the LJ code as an additional example, where the example first generates the data, saves it into files, and then uses it for training? We don't need to generate many data samples and traing it fully. We just need to make sure that the training kicks off correctly without errors.

@allaffa Yeah, I think this will be good to add LJ to the main branch. Would you be free to hammer out the specifics? (your calendar seems booked) If not, we can just hash out here what this will look like:

(1) So, what I think you're saying is to transfer the whole LJ example to the hydragnn/examples directory, including dataset creation, train.py, and json files. This will pull from both your fork and PyTorch_interatomic_potentials. It should be able to run similar to the qm9 example, where ____.py will detect if there is data and generate it if not.

(2) Then, would the examples/LJ dataset be simply copy-pasted over to hydragnn/serialized_dataset to use for examples, or is that unrelated? It would be 500 data points in line with the other examples

examples/LennardJones/Forces_Scatterplot.png Outdated Show resolved Hide resolved
examples/LennardJones/LJ_vlad_atomic_forces.json Outdated Show resolved Hide resolved
examples/LennardJones/LJ_multitask.json Outdated Show resolved Hide resolved
examples/LennardJones/LJ_vlad_total_energy.json Outdated Show resolved Hide resolved
examples/LennardJones/graph_utils.py Outdated Show resolved Hide resolved
examples/LennardJones/train_vlad_atomic_forces.py Outdated Show resolved Hide resolved
examples/LennardJones/train_vlad_total_energy.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@allaffa allaffa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RylieWeaver
Additional changes that you forgot to address in the preview review of the PR

@allaffa allaffa self-requested a review September 17, 2024 20:35
Copy link
Collaborator

@allaffa allaffa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RylieWeaver
I committed just one tiny change on the year of the Copyright.

@RylieWeaver
Copy link
Collaborator Author

@RylieWeaver I committed just one tiny change on the year of the Copyright.

@allaffa Ok, cool, I just did a couple of cleanup things as well but it should pass all tests again.

@allaffa allaffa self-requested a review September 18, 2024 02:22
@allaffa
Copy link
Collaborator

allaffa commented Sep 18, 2024

@pzhanggit @jychoi-hpc
I have been working with Rylie on this PR for quite some time.
Even if nominally all commits are authored by him, some portions of the code pushed in this PR were written by me in local scripts that I passed on to him.

Because of this, since I have been heavily involved in the code development of this PR myself, I would appreciate if you two could review this PR with fresh eyes. You may catch issues or concerns in terms of software design that I have not caught myself.

@RylieWeaver
Copy link
Collaborator Author

Double-clicking on this, let me know what you think @pzhanggit @jychoi-hpc

There are some more PRs in waiting that require this one first, so I'd like to get this one approved or revised soon if possible. Thanks.

@allaffa
Copy link
Collaborator

allaffa commented Sep 19, 2024

Double-clicking on this, let me know what you think @pzhanggit @jychoi-hpc

There are some more PRs in waiting that require this one first, so I'd like to get this one approved or revised soon if possible. Thanks.

@pzhanggit and @jychoi-hpc

For the interest of time, please focus on reviewing only the two files that change the core capabilities of hydragnn, namely:

  • hydragnn/models/Base.py
  • hydragnn/train/train_validate_test.py

All the other files are in the "examples" folder, therefore they do no affect the core of HydraGNN, and are the result of extensive work between me, Rylie, and Pranav anyway. The reason why I asked for you to review the PR is because of the two files above.

Copy link
Member

@jychoi-hpc jychoi-hpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update. I looked over the base and train script. It looks good to me.

Comment on lines +389 to +395
forces_pred = torch.autograd.grad(
graph_energy_pred,
data.pos,
grad_outputs=torch.ones_like(graph_energy_pred),
retain_graph=graph_energy_pred.requires_grad, # Retain graph only if needed (it will be needed during training, but not during validation/testing)
create_graph=True,
)[0].float()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is grad_outputs used here? This would lead to sum of the gradients, right?

Copy link
Collaborator Author

@RylieWeaver RylieWeaver Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.

Here's my understanding of it:
The grad_outputs argument essentially serves as a weight to multiply the gradients. This is necessary since graph_energy_pred is a higher dimension than just a scalar. The torch.ones_like() here will assign a weight of 1 to each gradient, which I believe is what we want. Otherwise, there would be a force prediction which is c*grad_E where c is a constant not equal to 1, which is nonphysical.

Specifically to your question:
No, I don't believe that it results in a sum. The output shape of forces_pred will be in the same shape as data.pos. The grad_outputs specifically is for weighting the gradients before a sum, but that extra step of summing is outside of autograd calculation I believe.

Does this answer your question?

@pzhanggit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.

Here's my understanding of it: The grad_outputs argument essentially serves as a weight to multiply the gradients. This is necessary since graph_energy_pred is a higher dimension than just a scalar. The torch.ones_like() here will assign a weight of 1 to each gradient, which I believe is what we want. Otherwise, there would be a force prediction which is c*grad_E where c is a constant not equal to 1, which is nonphysical.

Specifically to your question: No, I don't believe that it results in a sum. The output shape of forces_pred will be in the same shape as data.pos. The grad_outputs specifically is for weighting the gradients before a sum, but that extra step of summing is outside of autograd calculation I believe.

Does this answer your question?

@pzhanggit

Thanks. So just to verify if my understanding is correct: grad_outputs is needed because we're calculating gradients of a batch of samples here. And since grad_outputs should be a sequence of length matching output containing the “vector” in vector-Jacobian product, it provides a way to aggregate/sum the gradients across all the samples in the batch. This is equivalent to calculate the gradients iteratively for each sample, since the cross-gradients between samples would be zero.

Copy link
Collaborator Author

@RylieWeaver RylieWeaver Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pzhanggit

grad_outputs is needed because we're calculating gradients of a batch of samples here

Yep, that's right.

And since grad_outputs should be a sequence of length matching output containing the “vector” in vector-Jacobian product

I think so. Yes, grad_outputs should be a sequence, and it should be the same shape as whatever you're predicting. It then multiplies the matrix of gradients (row-wise, not normal matrix multiplication), which I think is the vector-Jacobian product you're referring to.

it provides a way to aggregate/sum the gradients across all the samples in the batch.

I think yes. It's scaling those gradients, which would be relevant in an aggregation/sum. Although, it does not do that aggregation/sum.

This is equivalent to calculate the gradients iteratively for each sample, since the cross-gradients between samples would be zero.

Yep.

Are there any parts still unclear?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pzhanggit

grad_outputs is needed because we're calculating gradients of a batch of samples here

Yep, that's right.

And since grad_outputs should be a sequence of length matching output containing the “vector” in vector-Jacobian product

I think so. Yes, grad_outputs should be a sequence, and it should be the same shape as whatever you're predicting. It then multiplies the matrix of gradients (row-wise, not normal matrix multiplication), which I think is the vector-Jacobian product you're referring to.

it provides a way to aggregate/sum the gradients across all the samples in the batch.

I think yes. It's scaling those gradients, which would be relevant in an aggregation/sum. Although, it does not do that aggregation/sum.

This is equivalent to calculate the gradients iteratively for each sample, since the cross-gradients between samples would be zero.

Yep.

Are there any parts still unclear?

Scaling those gradients?

Copy link
Collaborator

@allaffa allaffa Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pzhanggit
The scaling is done with torch.ones_like(graph_energy_pred)
In our case, we use vectors of ones because we do NOT want to scale. However, you could apply any multiplying factor (or even provide a customized vector with different values for each entry).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pzhanggit @allaffa ^ on what Max said. Also, feel free to @ me so In reply faster :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pzhanggit @RylieWeaver

I personally found the explanation given by ChatGPT very useful. I attach it here hoping that you will find it useful too.

Screenshot 2024-09-19 at 5 46 13 PM Screenshot 2024-09-19 at 5 46 24 PM Screenshot 2024-09-19 at 5 46 35 PM Screenshot 2024-09-19 at 5 46 44 PM Screenshot 2024-09-19 at 5 46 58 PM Screenshot 2024-09-19 at 5 47 21 PM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Wow, the o1-preview is great

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

@allaffa allaffa merged commit 32660c7 into main Sep 20, 2024
2 checks passed
@allaffa allaffa deleted the energy_forces branch September 20, 2024 00:20
RylieWeaver added a commit to RylieWeaver/HydraGNN that referenced this pull request Sep 25, 2024
RylieWeaver added a commit to RylieWeaver/HydraGNN that referenced this pull request Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants