Sparse ModernBERT

This repo contains the code used for the experiments with ModernBERT in the AdaSplash paper: https://arxiv.org/abs/2502.12082.

The efficient alpha-entmax attention kernels can be found in the AdaSplash repo.

Installation

Check the steps in the original ModernBERT repo.

Models on Huggingface

Alpha = 1.5: https://huggingface.co/sardinelab/SparseModernBERT-alpha1.5
Alpha = 2.0: https://huggingface.co/sardinelab/SparseModernBERT-alpha2.0

Training

Check the scripts:

Pretrain ModernBERT on MLM: train_modernbert.sh
Finetune on recall tasks: examples/run_st.sh

Evaluating

Load the model:

from transformers import AutoTokenizer
from sparse_modern_bert import CustomModernBertModel

model_id = "sardinelab/SparseModernBERT-alpha1.5"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = CustomModernBertModel.from_pretrained(model_id, trust_remote_code=True)

An example is provided in examples/evaluate_st_sparse.py.

AdaSplash

AdaSplash is an efficient adaptive sparse attention mechanism implemented in Triton. See repo: https://github.com/deep-spin/adasplash

Reference

@inproceedings{goncalves2025adasplash,
    title={AdaSplash: Adaptive Sparse Flash Attention},
    author={Nuno Gon{\c{c}}alves and Marcos V Treviso and Andre Martins},
    booktitle={Forty-second International Conference on Machine Learning},
    year={2025},
    url={https://openreview.net/forum?id=OWIPDWhUcO}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
examples		examples
src		src
tests		tests
yamls		yamls
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RunEvals.md		RunEvals.md
__init__.py		__init__.py
_colbert.py		_colbert.py
alpha_scheduler.py		alpha_scheduler.py
benchmark.py		benchmark.py
eval.py		eval.py
generate_eval_config.py		generate_eval_config.py
glue.py		glue.py
main.py		main.py
requirements-colbert.txt		requirements-colbert.txt
requirements-cpu.txt		requirements-cpu.txt
requirements-data.txt		requirements-data.txt
requirements.txt		requirements.txt
run_evals.py		run_evals.py
sequence_classification.py		sequence_classification.py
sparse_modern_bert.py		sparse_modern_bert.py
sparse_roberta.py		sparse_roberta.py
train_modernbert.py		train_modernbert.py
train_modernbert.sh		train_modernbert.sh
train_roberta.py		train_roberta.py
train_roberta.sh		train_roberta.sh
wandb_log_live_eval.py		wandb_log_live_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sparse ModernBERT

Installation

Models on Huggingface

Training

Evaluating

AdaSplash

Reference

About

Uh oh!

Releases

Packages

Languages

License

deep-spin/SparseModernBERT

Folders and files

Latest commit

History

Repository files navigation

Sparse ModernBERT

Installation

Models on Huggingface

Training

Evaluating

AdaSplash

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages