This repo contains the code used for the experiments with ModernBERT in the AdaSplash paper: https://arxiv.org/abs/2502.12082.
The efficient alpha-entmax attention kernels can be found in the AdaSplash repo.
Check the steps in the original ModernBERT repo.
- Alpha = 1.5: https://huggingface.co/sardinelab/SparseModernBERT-alpha1.5
- Alpha = 2.0: https://huggingface.co/sardinelab/SparseModernBERT-alpha2.0
Check the scripts:
- Pretrain ModernBERT on MLM:
train_modernbert.sh
- Finetune on recall tasks:
examples/run_st.sh
Load the model:
from transformers import AutoTokenizer
from sparse_modern_bert import CustomModernBertModel
model_id = "sardinelab/SparseModernBERT-alpha1.5"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = CustomModernBertModel.from_pretrained(model_id, trust_remote_code=True)
An example is provided in examples/evaluate_st_sparse.py
.
AdaSplash is an efficient adaptive sparse attention mechanism implemented in Triton. See repo: https://github.com/deep-spin/adasplash
@inproceedings{goncalves2025adasplash,
title={AdaSplash: Adaptive Sparse Flash Attention},
author={Nuno Gon{\c{c}}alves and Marcos V Treviso and Andre Martins},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=OWIPDWhUcO}
}