Linear scheduler with warmup #2415

lukasgarbas · 2021-09-07T08:13:20Z

Hi,

as suggested in #2396, a linear learning rate schedule can work nicely when tuning transformer models. One way to add this is to create a LinearSchedulerWithWarmup class that can be passed to trainer.train() together with the number of warmup steps as a fraction (warmup_fraction=0.1 by default). The learning rate linearly increases from zero during warmup and decays for the rest of the training. Here’s an example:

from flair.optim import LinearSchedulerWithWarmup

trainer = ModelTrainer(...,optimizer=AdamW)

trainer.train(learning_rate=2e-5,
              mini_batch_size=16,
              max_epochs=10,
              scheduler=LinearSchedulerWithWarmup,
              warmup_fraction=0.1) # warmup for 0.1 of total train steps (1 epoch)

Note on using other schedulers than AnnealOnPlateau:

Current logging in the trainer prints ‘patience’, ‘anneal_factor’, and 'bad epochs' that are not used by other lr schedulers. I guess the only way to change it is to adjust the trainer.

Let me know what do you think. I'm also open to other suggestions on how to add this : )

Linearly increase the learning rate from 0 to initial lr during warmup and decrease the learning rate to 0 after the warmup. Can be used when fine-tuning transformer models.

ciaochiaociao · 2021-09-09T08:07:56Z

Thank you for a prompt support for this feature!

I will also leave some of my thoughts here:
The other repositories like huggingface actually implements it with the API of Trainer.train(optimizers=(optimizer_instance, scheduler_instance)), taking as input the optimizer instance and scheduler instance, which I believe has the following benefits:
a. allowing for users to use their own customized optimizer and scheduler, which normally can be made by LambdaLR from torch.optim. (see https://huggingface.co/transformers/main_classes/trainer.html#transformers.Trainer)
b. allowing for saving the current state of the optimizer and scheduler and making easier recovering their states back to where they are finished/interrupted. This helps when one wants to continue previously terminated training like trainer.train(resume_from_checkpoint='folder_containing_model_optimizer_and_scheduler') as in https://huggingface.co/transformers/main_classes/trainer.html#transformers.Trainer.train

P.S. Currently in flair, to resume training from flair, the tutorial has it in https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_7_TRAINING_A_MODEL.md#resuming-training, where the code is

flair/flair/trainers/trainer.py

Line 759 in c5bed58

self.model.load_state_dict(self.model.load(base_path / "best-model.pt").state_dict())

. Flair only loads the model state but not for scheduler and optimizer.

alanakbik · 2021-09-14T12:39:15Z

@lukasgarbas thanks for adding this - works well!

@ciaochiaociao good points - ideally, I'd like to find an abstraction such that for most users, a default fine-tuning setup is used without needing to do many modifications. For instance, next to the current trainer.train() which does SGD with learning_rate=0.1, 150 epochs and annealing against dev score (i.e. good default values for standard training), I've been thinking of adding a trainer.fine_tune() that uses AdamW, linear scheduler with warmup, a small learning rate and only 10 epochs by default (good default values for fine-tuning transformers). Like it currently is in trainer.train(), users can overwrite pre-set variables like setting a different mini-batch size or learning rate. Then, for "power users" that would define their own schedulers and optimizers, we'd need to add a different way perhaps like you suggest.

I'll merge this PR for now but will keep this in mind as we work on the interfaces for the next version!

Linear scheduler with warmup

27c6192

Linearly increase the learning rate from 0 to initial lr during warmup and decrease the learning rate to 0 after the warmup. Can be used when fine-tuning transformer models.

alanakbik merged commit 7b44afd into flairNLP:master Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear scheduler with warmup #2415

Linear scheduler with warmup #2415

lukasgarbas commented Sep 7, 2021

ciaochiaociao commented Sep 9, 2021 •

edited

Loading

alanakbik commented Sep 14, 2021

Linear scheduler with warmup #2415

Linear scheduler with warmup #2415

Conversation

lukasgarbas commented Sep 7, 2021

ciaochiaociao commented Sep 9, 2021 • edited Loading

alanakbik commented Sep 14, 2021

ciaochiaociao commented Sep 9, 2021 •

edited

Loading