Release Training Transformers · proger/haloop

This release doubles down on transformers and introduces a training loop program hala. Pretraining bidirectional models with token denoising objective (aka masked LM) is available hala --objective denoise. The first training run on uk4b dataset is happening here: https://wandb.ai/stud76/ha/runs/tjoqx491?workspace=user-stud76

Existing causal models can now be finetuned with conditional language modeling objective $p(y|x)$, which can be used to implement classification with hala --objective cond.

hat is now a repl for both causal and bidirectional models. The hat repl now supports history thanks to readline.

RNN training program hal now supports training from u16 binary datasets like hala. This allowed me to train a world model on VQ-VAE-tokenized images.

New randomly initialized checkpoints can be created with new the hai program.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Transformers