Skip to content

AI-SDC/SACRO-ML

Repository files navigation

SACRO-ML

DOI PyPI package Conda Python versions codecov

An increasing body of work has shown that machine learning (ML) models may expose confidential properties of the data on which they are trained. This has resulted in a wide range of proposed attack methods with varying assumptions that exploit the model structure and/or behaviour to infer sensitive information.

The sacroml package is a collection of tools and resources for managing the statistical disclosure control (SDC) of trained ML models. In particular, it provides:

  • A safemodel package that extends commonly used ML models to provide ante-hoc SDC by assessing the theoretical risk posed by the training regime (such as hyperparameter, dataset, and architecture combinations) before (potentially) costly model fitting is performed. In addition, it ensures that best practice is followed with respect to privacy, e.g., using differential privacy optimisers where available. For large models and datasets, ante-hoc analysis has the potential for significant time and cost savings by helping to avoid wasting resources training models that are likely to be found to be disclosive after running intensive post-hoc analysis.
  • An attacks package that provides post-hoc SDC by assessing the empirical disclosure risk of a classification model through a variety of simulated attacks after training. It provides an integrated suite of attacks with a common application programming interface (API) and is designed to support the inclusion of additional state-of-the-art attacks as they become available. In addition to membership inference attacks (MIA) such as the likelihood ratio attack (LiRA) and attribute inference, the package provides novel structural attacks that report cheap-to-compute metrics, which can serve as indicators of model disclosiveness after model fitting, but before needing to run more computationally expensive MIAs.
  • Summaries of the results are written in a simple human-readable report.

Installation

Python Package Index

$ pip install sacroml

Note: macOS users may need to install libomp due to a dependency on XGBoost:

$ brew install libomp

Conda

$ conda install sacroml

Usage

Quick-start example:

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from sacroml.attacks.likelihood_attack import LIRAAttack
from sacroml.attacks.target import Target

# Load dataset
X, y = load_breast_cancer(return_X_y=True, as_frame=False)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Fit model
model = RandomForestClassifier(min_samples_split=2, min_samples_leaf=1)
model.fit(X_train, y_train)

# Wrap model and data
target = Target(
    model=model,
    dataset_name="breast cancer",
    X_train=X_train,
    y_train=y_train,
    X_test=X_test,
    y_test=y_test,
)

# Create an attack object and run the attack
attack = LIRAAttack(n_shadow_models=100, output_dir="output_example")
attack.attack(target)

For more information, see the examples.

Documentation

See API documentation.

Contributing

See our contributing guide.

Acknowledgement

This work was supported by UK Research and Innovation as part of the Data and Analytics Research Environments UK (DARE UK) programme, delivered in partnership with Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK). The specific projects were Semi-Automated Checking of Research Outputs (SACRO; MC_PC_23006), Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMATTER; MC_PC_21033), and TREvolution (MC_PC_24038). This project has also been supported by MRC and EPSRC (PICTURES; MR/S010351/1).

Packages

No packages published

Contributors 12

Languages