Skip to content

Training Transformers

Latest
Compare
Choose a tag to compare
@proger proger released this 16 Jun 10:57
· 196 commits to main since this release
26e6da1

This release doubles down on transformers and introduces a training loop program hala. Pretraining bidirectional models with token denoising objective (aka masked LM) is available hala --objective denoise. The first training run on uk4b dataset is happening here: https://wandb.ai/stud76/ha/runs/tjoqx491?workspace=user-stud76

Existing causal models can now be finetuned with conditional language modeling objective $p(y|x)$, which can be used to implement classification with hala --objective cond.

hat is now a repl for both causal and bidirectional models. The hat repl now supports history thanks to readline.

image

RNN training program hal now supports training from u16 binary datasets like hala. This allowed me to train a world model on VQ-VAE-tokenized images.
image

New randomly initialized checkpoints can be created with new the hai program.