Skip to content

deep-spin/SparseModernBERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparse ModernBERT

This repo contains the code used for the experiments with ModernBERT in the AdaSplash paper: https://arxiv.org/abs/2502.12082.

The efficient alpha-entmax attention kernels can be found in the AdaSplash repo.

Installation

Check the steps in the original ModernBERT repo.

Models on Huggingface

Training

Check the scripts:

  • Pretrain ModernBERT on MLM: train_modernbert.sh
  • Finetune on recall tasks: examples/run_st.sh

Evaluating

Load the model:

from transformers import AutoTokenizer
from sparse_modern_bert import CustomModernBertModel

model_id = "sardinelab/SparseModernBERT-alpha1.5"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = CustomModernBertModel.from_pretrained(model_id, trust_remote_code=True)

An example is provided in examples/evaluate_st_sparse.py.

AdaSplash

AdaSplash is an efficient adaptive sparse attention mechanism implemented in Triton. See repo: https://github.com/deep-spin/adasplash

Reference

@inproceedings{goncalves2025adasplash,
    title={AdaSplash: Adaptive Sparse Flash Attention},
    author={Nuno Gon{\c{c}}alves and Marcos V Treviso and Andre Martins},
    booktitle={Forty-second International Conference on Machine Learning},
    year={2025},
    url={https://openreview.net/forum?id=OWIPDWhUcO}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published