Multihead replay finetuning converges more slowly than regular training #626

lucasdekam · 2024-10-08T10:09:23Z

lucasdekam
Oct 8, 2024

Hi, I'd like to share my experience so far with multihead finetuning and ask for ideas.

I'm training MACE on a dataset with ~80 platinum-water interface structures of ~400 atoms each; energy and forces are evaluated using VASP with the RPBE functional. I've tried two methods: multihead replay finetuning (starting from the mace-mp0b agnesi small model) and naive finetuning (starting from the standard small model). The training parameters can be found below. The multihead finetuning converges much more slowly and the model struggles to converge the errors on the replayed data (pt_head) again. I've not been able to get the forces error much lower than 100 meV/A, although perhaps this could be achieved by longer training.

2024-10-06 18:06:17.060 INFO: Epoch 99: head: pt_head, loss=  0.0016, RMSE_E_per_atom=   639.6 meV, RMSE_F=   729.4 meV / A, RMSE_stress=    40.4 meV / A^3
2024-10-06 18:06:17.189 INFO: Epoch 99: head: default, loss=  0.0026, RMSE_E_per_atom=     1.2 meV, RMSE_F=   119.0 meV / A, RMSE_stress=    10.9 meV / A^3

Because I wanted to converge the forces faster, I increased the forces weight by a factor 10, which gave this result. With equal weights the forces converge even more slowly.

On the other hand, naive finetuning converges pretty fast to a a rather low force RMSE. For me, the resulting model also seems very stable, so there's no "catastrophic forgetting" (at least not that I've noticed).

2024-10-06 18:41:18.014 INFO: Epoch 49: head: default, loss=  0.0009, RMSE_E_per_atom=     0.3 meV, RMSE_F=    51.1 meV / A

I'm still interested in using the multiheads training, as it might improve the generalizability of my model. My question: what could cause the slow convergence of multiheads training? Is this already known? What parameters can one tune to achieve better convergence (should I increase the forces weight even more, etc.)? I'd be happy to hear about any insights :)

Training parameters:

Multihead

mace_run_train \
    --name="MACE" \
    --foundation_model="../mace_agnesi_small.model" \
    --multiheads_finetuning=True \
    --train_file="../train.xyz" \
    --valid_fraction=0.05 \
    --test_file="../test.xyz" \
    --energy_weight=1.0 \
    --forces_weight=10.0 \
    --energy_key='DFT_energy' \
    --forces_key='DFT_forces' \
    --E0s="{1: -1.20502718, 8: -1.60386686, 78: -0.5578757}" \
    --lr=0.01 \
    --scaling="rms_forces_scaling" \
    --batch_size=3 \
    --max_num_epochs=100 \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --default_dtype="float64" \
    --device=cuda \
    --seed=3

Naive

mace_run_train \
    --name="MACE" \
    --foundation_model="small" \
    --multiheads_finetuning=False \
    --train_file="../train.xyz" \
    --valid_fraction=0.05 \
    --test_file="../test.xyz" \
    --energy_weight=1.0 \
    --forces_weight=1.0 \
    --energy_key='DFT_energy' \
    --forces_key='DFT_forces' \
    --E0s="{1: -1.20502718, 8: -1.60386686, 78: -0.5578757}" \
    --lr=0.01 \
    --scaling="rms_forces_scaling" \
    --batch_size=2 \
    --max_num_epochs=50 \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --default_dtype="float64" \
    --device=cuda \
    --seed=1

ilyes319 · 2024-10-08T15:43:07Z

ilyes319
Oct 8, 2024
Maintainer

Hello,

Can you please share the log files for the two training so I can help you. I need for example to look at the initial loss to see if there is a potential problem.
The multihead replay requires precise computation of the E0s, and that is usually the reason for problems.
How did you compute the E0s for your DFT. Did you make sure that the oxygen E0s are spin polarized, it is very important.

10 replies

ilyes319 Oct 11, 2024
Maintainer

I expect the MP head to go up quite a lot but not to that extent.

Yes that means that not so many configs have combinations of Pt, H and O.

lucasdekam Oct 11, 2024
Author

Hmm, maybe it's difficult for the finetuning if the finetuning set & replay set have different elements?

ilyes319 Oct 11, 2024
Maintainer

It should be fine, we have quite a lot of experience with that. Could you try running the same run but with 3 different seeds and share the log files?

lucasdekam Oct 11, 2024
Author

Can do, any preference for the forces weight? Should I leave it at 10 or revert to 1?

ilyes319 Oct 11, 2024
Maintainer

I think use 10 forces and 1 energies, that is matching what we use for MP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multihead replay finetuning converges more slowly than regular training #626

{{title}}

Replies: 1 comment 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Multihead replay finetuning converges more slowly than regular training #626

lucasdekam Oct 8, 2024

Training parameters:

Replies: 1 comment · 10 replies

ilyes319 Oct 8, 2024 Maintainer

ilyes319 Oct 11, 2024 Maintainer

lucasdekam Oct 11, 2024 Author

ilyes319 Oct 11, 2024 Maintainer

lucasdekam Oct 11, 2024 Author

ilyes319 Oct 11, 2024 Maintainer

lucasdekam
Oct 8, 2024

Replies: 1 comment 10 replies

ilyes319
Oct 8, 2024
Maintainer

ilyes319 Oct 11, 2024
Maintainer

lucasdekam Oct 11, 2024
Author

ilyes319 Oct 11, 2024
Maintainer

lucasdekam Oct 11, 2024
Author

ilyes319 Oct 11, 2024
Maintainer