Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear scheduler with warmup #2415

Merged
merged 1 commit into from
Sep 14, 2021
Merged

Linear scheduler with warmup #2415

merged 1 commit into from
Sep 14, 2021

Conversation

lukasgarbas
Copy link
Collaborator

Hi,

as suggested in #2396, a linear learning rate schedule can work nicely when tuning transformer models. One way to add this is to create a LinearSchedulerWithWarmup class that can be passed to trainer.train() together with the number of warmup steps as a fraction (warmup_fraction=0.1 by default). The learning rate linearly increases from zero during warmup and decays for the rest of the training. Here’s an example:

from flair.optim import LinearSchedulerWithWarmup

trainer = ModelTrainer(...,optimizer=AdamW)

trainer.train(learning_rate=2e-5,
              mini_batch_size=16,
              max_epochs=10,
              scheduler=LinearSchedulerWithWarmup,
              warmup_fraction=0.1) # warmup for 0.1 of total train steps (1 epoch) 

Note on using other schedulers than AnnealOnPlateau:

  • Current logging in the trainer prints ‘patience’, ‘anneal_factor’, and 'bad epochs' that are not used by other lr schedulers. I guess the only way to change it is to adjust the trainer.

Let me know what do you think. I'm also open to other suggestions on how to add this : )

Linearly increase the learning rate from 0 to initial lr during warmup
and decrease the learning rate to 0 after the warmup.
Can be used when fine-tuning transformer models.
@ciaochiaociao
Copy link

ciaochiaociao commented Sep 9, 2021

Thank you for a prompt support for this feature!

I will also leave some of my thoughts here:
The other repositories like huggingface actually implements it with the API of Trainer.train(optimizers=(optimizer_instance, scheduler_instance)), taking as input the optimizer instance and scheduler instance, which I believe has the following benefits:
a. allowing for users to use their own customized optimizer and scheduler, which normally can be made by LambdaLR from torch.optim. (see https://huggingface.co/transformers/main_classes/trainer.html#transformers.Trainer)
b. allowing for saving the current state of the optimizer and scheduler and making easier recovering their states back to where they are finished/interrupted. This helps when one wants to continue previously terminated training like trainer.train(resume_from_checkpoint='folder_containing_model_optimizer_and_scheduler') as in https://huggingface.co/transformers/main_classes/trainer.html#transformers.Trainer.train

P.S. Currently in flair, to resume training from flair, the tutorial has it in https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_7_TRAINING_A_MODEL.md#resuming-training, where the code is

self.model.load_state_dict(self.model.load(base_path / "best-model.pt").state_dict())
. Flair only loads the model state but not for scheduler and optimizer.

@alanakbik
Copy link
Collaborator

@lukasgarbas thanks for adding this - works well!

@ciaochiaociao good points - ideally, I'd like to find an abstraction such that for most users, a default fine-tuning setup is used without needing to do many modifications. For instance, next to the current trainer.train() which does SGD with learning_rate=0.1, 150 epochs and annealing against dev score (i.e. good default values for standard training), I've been thinking of adding a trainer.fine_tune() that uses AdamW, linear scheduler with warmup, a small learning rate and only 10 epochs by default (good default values for fine-tuning transformers). Like it currently is in trainer.train(), users can overwrite pre-set variables like setting a different mini-batch size or learning rate. Then, for "power users" that would define their own schedulers and optimizers, we'd need to add a different way perhaps like you suggest.

I'll merge this PR for now but will keep this in mind as we work on the interfaces for the next version!

@alanakbik alanakbik merged commit 7b44afd into flairNLP:master Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants