Classification Evaluation (cleva)

A framework for training, evaluating, and comparing machine learning models on classification tasks.

Features

Modular Design:
- Separate components for data loading, preprocessing, model definition, and experiment running
Built-in Datasets:
- Taiwanese Credit Default dataset
- US Permanent Visa application dataset
Preprocessing Pipeline:
- Automated datetime feature extraction
- Cyclical feature encoding

Getting Started

click through start.ipynb for a quick start

Todos

data.loaders

Port cleaner and preprocessing code for BAM data from Dan's project

models.utils

Check: Validity of existing pipeline templates
UserWarning: Skipping features without any observed values: ['country_of_citizenship']. At least one non-missing value is needed for imputation with strategy='most_frequent'.
split up logic of get_pipeline into two parts: imputation and encoding. Encoding should change only with the model but not dataset. - Imputation changes with dataset, may or may not depend on model?
Learn: should use LabelEncoder in preprocessing to encode string features numerically? For ordinal categorical features with a meaningful - order
Build CatBoost pipeline

experiments.runners

Something about the data split is not working: We should try and ignore the train-test ratio and use more test data if the model limits - aren't reached (instead of not using some data, which is what we're doing now)
print info / number of features in X_train (just to check what came through the pipeline)
add parameters for models to pass through, e.g. RandomForestClassifier(n_estimators=200)
parameterize the subsampler for training data (for increasing subsample experiments)
Build experiment setup using MLXP
chunk-sample over test data (to get results on all test data even for TabPFN)
build saving run results as CSV or so
place artifacts and results in sensible output path in outputs/

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
evaluation		evaluation
experiments		experiments
models		models
.gitignore		.gitignore
README.md		README.md
learningcurve.ipynb		learningcurve.ipynb
requirements.txt		requirements.txt
start.ipynb		start.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Classification Evaluation (cleva)

Features

Getting Started

Todos

data.loaders

models.utils

experiments.runners

About

Uh oh!

Releases

Packages

Uh oh!

Languages

josk0/cleva

Folders and files

Latest commit

History

Repository files navigation

Classification Evaluation (cleva)

Features

Getting Started

Todos

data.loaders

models.utils

experiments.runners

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages