v0.3.1
v0.3.1
Model
- Add drop path to regularize large models, and it works quite well for deep models
- Add EMA
Other
- Add one package dependency:
timm
, to implement EMA - Update README to include details of Eulerian sequence and cyclic node re-index.
- Code refactoring.
- Tokenization config json refactoring.
- Update vocab by adding some special tokens, e.g.,
<bos>
,<new>
,<mask>
and etc. - Turn of optimizer offload in deepspeed config to boost the training speed.