Release optimizer memory and fix legacy tokenization #3043

helpmefindaname · 2023-01-04T17:36:49Z

This PR does 2 little things:

it makes it easier to clean up the gpu memory, by cleaning gradients and deleting the reference of the optimzier. Hence torch.cuda.empty_cache() will be able to clean up more parameters.
some (slow) tokenizers like RobertaTokenizer seem to add a [SEP] token at the end, breaking the legacy subword token mapping, which is fixed by allowing that specific token

helpmefindaname · 2023-01-06T13:55:44Z

With those changes, I managed to train XLM-Roberta-Large (>2GB model) with adam and full precision (4x the memory requirement) on my 6GB laptop graphic card, using transformer-smaller-training-vocab

alanakbik · 2023-01-16T14:53:35Z

@helpmefindaname thanks for improving this!

helpmefindaname added 2 commits January 4, 2023 18:29

release optimizer memory after finishing training

ca09f72

fix legacy tokenizer when SEP tokens are added

87f5577

alanakbik merged commit fa9b104 into master Jan 16, 2023

alanakbik deleted the release_optimizer_memory_and_fix_legacy_tokenization branch January 16, 2023 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release optimizer memory and fix legacy tokenization #3043

Release optimizer memory and fix legacy tokenization #3043

helpmefindaname commented Jan 4, 2023 •

edited

Loading

helpmefindaname commented Jan 6, 2023

alanakbik commented Jan 16, 2023

Release optimizer memory and fix legacy tokenization #3043

Release optimizer memory and fix legacy tokenization #3043

Conversation

helpmefindaname commented Jan 4, 2023 • edited Loading

helpmefindaname commented Jan 6, 2023

alanakbik commented Jan 16, 2023

helpmefindaname commented Jan 4, 2023 •

edited

Loading