Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best model after epoch #46

Merged
merged 2 commits into from
May 27, 2022
Merged

Best model after epoch #46

merged 2 commits into from
May 27, 2022

Conversation

natuan
Copy link

@natuan natuan commented May 20, 2022

This change introduces an option to specify an epoch after which the best model could be saved, and it could be used in conjunction with the existing flags "metric_for_best_model" and "load_best_model_at_end". A use case here is that when doing pruning or transferring followed by quantization, one might use this flag to obtain the best quantized model (which is only valid after the pruning/transferring ends).

Copy link

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natuan great feature to have. Looks like it would be tough, but is there any way we can get this change to be just on the sparseml repo? always trying to minimize divergence between our fork and upstream where possible - but given it's a one line change in a long function having it here seems ok.

one idea might be adding in best_model_after_epoch to the sparseml side trainer/scripts and having a conditional check for it in the save function. Thoughts?

@natuan
Copy link
Author

natuan commented May 25, 2022

@natuan great feature to have. Looks like it would be tough, but is there any way we can get this change to be just on the sparseml repo? always trying to minimize divergence between our fork and upstream where possible - but given it's a one line change in a long function having it here seems ok.

one idea might be adding in best_model_after_epoch to the sparseml side trainer/scripts and having a conditional check for it in the save function. Thoughts?

I think we could move this into sparseml by overloading _save_checkpoint (some refactoring required to make it clean): neuralmagic/sparseml#814

anmarques
anmarques previously approved these changes May 27, 2022
Copy link
Member

@anmarques anmarques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

bfineran
bfineran previously approved these changes May 27, 2022
@bfineran bfineran dismissed stale reviews from anmarques and themself via 0b7e5ff May 27, 2022 17:45
@bfineran bfineran merged commit 15ba9b7 into master May 27, 2022
@bfineran bfineran deleted the best_model_after_epoch branch May 27, 2022 20:43
KSGulin pushed a commit that referenced this pull request Oct 14, 2022
KSGulin pushed a commit that referenced this pull request Oct 14, 2022
Disable FP16 on QAT start (#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (#41)

Removed double quantization of output of context layer. (#45)

Fix DataParallel validation forward signatures (#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (#46)

fix sclaer check for non fp16 mode in trainer (#38)

Mobilebert QAT (#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (#9)
bfineran added a commit that referenced this pull request Oct 18, 2022
* Update trainer and model flows to accommodate sparseml

Disable FP16 on QAT start (#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (#41)

Removed double quantization of output of context layer. (#45)

Fix DataParallel validation forward signatures (#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (#46)

fix sclaer check for non fp16 mode in trainer (#38)

Mobilebert QAT (#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (#9)

* Utils and auxillary changes

update Zoo stub loading for SparseZoo 1.1 refactor (#54)

add flag to signal NM integration is active (#32)

Add recipe_name to file names

* Fix errors introduced in manual cherry-pick upgrade

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
KSGulin added a commit that referenced this pull request Jun 19, 2023
* Add recipe_name to default file names

* Upgrade to transformers release V4.30.2 (#62)

* Update trainer and model flows to accommodate sparseml

Disable FP16 on QAT start (#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (#41)

Removed double quantization of output of context layer. (#45)

Fix DataParallel validation forward signatures (#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (#46)

fix sclaer check for non fp16 mode in trainer (#38)

Mobilebert QAT (#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (#9)

* Utils and auxillary changes

update Zoo stub loading for SparseZoo 1.1 refactor (#54)

add flag to signal NM integration is active (#32)

Add recipe_name to file names

* Fix errors introduced in manual cherry-pick upgrade

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

* update build versions for NM fork pypi push (#74)

* fix nightly package name (#75)

* add make build command (#76)

* add GHA workflow files to build nightly and release packages (#77)

* add GHA workflow files to build nightly and release packages

* fix name

---------

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

* bump up version to 1.6.0 (#79)

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

---------

Co-authored-by: Konstantin <konstantin@neuralmagic.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>
dsikka pushed a commit that referenced this pull request Aug 17, 2023
* Add recipe_name to default file names

* Upgrade to transformers release V4.30.2 (#62)

* Update trainer and model flows to accommodate sparseml

Disable FP16 on QAT start (#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (#41)

Removed double quantization of output of context layer. (#45)

Fix DataParallel validation forward signatures (#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (#46)

fix sclaer check for non fp16 mode in trainer (#38)

Mobilebert QAT (#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (#9)

* Utils and auxillary changes

update Zoo stub loading for SparseZoo 1.1 refactor (#54)

add flag to signal NM integration is active (#32)

Add recipe_name to file names

* Fix errors introduced in manual cherry-pick upgrade

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

* update build versions for NM fork pypi push (#74)

* fix nightly package name (#75)

* add make build command (#76)

* add GHA workflow files to build nightly and release packages (#77)

* add GHA workflow files to build nightly and release packages

* fix name

---------

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

* bump up version to 1.6.0 (#79)

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

---------

Co-authored-by: Konstantin <konstantin@neuralmagic.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>
dsikka pushed a commit that referenced this pull request Aug 17, 2023
* Add recipe_name to default file names

* Upgrade to transformers release V4.30.2 (#62)

* Update trainer and model flows to accommodate sparseml

Disable FP16 on QAT start (#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (#41)

Removed double quantization of output of context layer. (#45)

Fix DataParallel validation forward signatures (#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (#46)

fix sclaer check for non fp16 mode in trainer (#38)

Mobilebert QAT (#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (#9)

* Utils and auxillary changes

update Zoo stub loading for SparseZoo 1.1 refactor (#54)

add flag to signal NM integration is active (#32)

Add recipe_name to file names

* Fix errors introduced in manual cherry-pick upgrade

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

* update build versions for NM fork pypi push (#74)

* fix nightly package name (#75)

* add make build command (#76)

* add GHA workflow files to build nightly and release packages (#77)

* add GHA workflow files to build nightly and release packages

* fix name

---------

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

* bump up version to 1.6.0 (#79)

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

---------

Co-authored-by: Konstantin <konstantin@neuralmagic.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>
bfineran added a commit that referenced this pull request Oct 26, 2023
* Add recipe_name to default file names

* Upgrade to transformers release V4.30.2 (#62)

* Update trainer and model flows to accommodate sparseml

Disable FP16 on QAT start (#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (#41)

Removed double quantization of output of context layer. (#45)

Fix DataParallel validation forward signatures (#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (#46)

fix sclaer check for non fp16 mode in trainer (#38)

Mobilebert QAT (#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (#9)

* Utils and auxillary changes

update Zoo stub loading for SparseZoo 1.1 refactor (#54)

add flag to signal NM integration is active (#32)

Add recipe_name to file names

* Fix errors introduced in manual cherry-pick upgrade

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

* update build versions for NM fork pypi push (#74)

* fix nightly package name (#75)

* add make build command (#76)

* add GHA workflow files to build nightly and release packages (#77)

* add GHA workflow files to build nightly and release packages

* fix name

---------

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

* bump up version to 1.6.0 (#79)

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

---------

Co-authored-by: Konstantin <konstantin@neuralmagic.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>
bfineran added a commit that referenced this pull request Oct 27, 2023
(previous commits)
* Add recipe_name to default file names

* Upgrade to transformers release V4.30.2 (#62)

* Update trainer and model flows to accommodate sparseml

Disable FP16 on QAT start (#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (#41)

Removed double quantization of output of context layer. (#45)

Fix DataParallel validation forward signatures (#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (#46)

fix sclaer check for non fp16 mode in trainer (#38)

Mobilebert QAT (#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (#9)

* Utils and auxillary changes

update Zoo stub loading for SparseZoo 1.1 refactor (#54)

add flag to signal NM integration is active (#32)

Add recipe_name to file names

* Fix errors introduced in manual cherry-pick upgrade

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

* update build versions for NM fork pypi push (#74)

* fix nightly package name (#75)

* add make build command (#76)

* add GHA workflow files to build nightly and release packages (#77)

* add GHA workflow files to build nightly and release packages

* fix name

---------

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

* bump up version to 1.6.0 (#79)

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

---------

Co-authored-by: Konstantin <konstantin@neuralmagic.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

minor improvements for build workflow files (#83)

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

fix minor issue (#84)

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

OPT with quantizable MatMuls (#85)

fix a minor issue for release build (#86)

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

update version in version.py

Testmo (#91)

* improve GHA workflow files to build nightly and release, and report status to testmo

* clean up

* report exit code

* Assign value to exit_code

---------

Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

Update trainer.py - fix DistributedSampler import (#93)

DistributedSampler is used but not imported in `trainer.py`

Research/llama/bmm quantization (#94)

* Quantize attention matmuls

* Quantize attention matmuls

bump base transformers version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants