DroP: Distributionally Robust Data Pruning

Code to reproduce experimental results of the paper.

Quick Setup

Requires Python 3+.

Create a conda environment: conda env create -f environment.yml,
Activate the environment: conda activate environment.

Usage

The project implements both active learning (AL, --strategy 0) and data pruning (DP, --strategy 1). The command line flag --auto_config fills in the appropriate hyperparameters based on the model (recommended). The workflow of the main script is as follows:

Train a query model (possibly across multiple initializations) and retrieves sample scores;
Acquire (for AL) or remove (for DP) samples based on scores and other factors (e.g., class-wise quotas);
Potentially repeat steps 1-2 across multiple iterations (--iterations, common for AL);
Once the ultimate dataset is determined, train the final model and save its metrics in a json format.

Examples

Here are a few simple usage examples. The commands should be executed from a parent directory of the project folder.

Prune 30% of CIFAR-10 using VGG-16 and EL2N scorer:
python -m drop-data-pruning.main --auto_config --use_gpu --strategy 1 --final_frac 0.7 --model_name VGG16 --scorer_name EL2N
Randomly prune 30% of CIFAR-10 using VGG-16 and DRoP class-wise ratios with query retrained 5 times:
python -m drop-data-pruning.main --auto_config --use_gpu --strategy 1 --final_frac 0.7 --model_name VGG16 --scorer_name Random --quoter_name DRoP --num_inits 5
Prune 30% of CIFAR-10 using VGG-16 and Forgetting, and train the final model with a cost-sensitive optimization algorithm CDB-W :
python -m drop-data-pruning.main --auto_config --use_gpu --cdbw_final --strategy 1 --final_frac 0.7 --model_name VGG16 --scorer_name Random

Cite us

@InProceedings{vysogorets2025drop,
title = {DRoP: Distributionally Robust Data Pruning},
author = {Vysogorets, Artem and Ahuja, Kartik and Kempe, Julia},
booktitle = {Proceedings of the 13th International Conference on Learning Representations},
pages = {1--25},
year = {2025},
series = {Proceedings of Machine Learning Research},
month = {24--28 Apr},
publisher = {PMLR}}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
diversifiers		diversifiers
models		models
quoters		quoters
schedulers		schedulers
scorers		scorers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
check.py		check.py
config.py		config.py
environment.yml		environment.yml
globals.py		globals.py
main.py		main.py
metrics.py		metrics.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DroP: Distributionally Robust Data Pruning

Quick Setup

Usage

Examples

Cite us

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

avysogorets/drop-data-pruning

Folders and files

Latest commit

History

Repository files navigation

DroP: Distributionally Robust Data Pruning

Quick Setup

Usage

Examples

Cite us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages