Breast Cancer Predictor

author: Tiffany Timbers, Melissa Lee & Joel Ostblom

Demo of a data analysis project for DSCI 522 (Data Science workflows); a course in the Master of Data Science program at the University of British Columbia.

About

Here we attempt to build a classification model using the k-nearest neighbours algorithm which can use breast cancer tumour image measurements to predict whether a newly discovered breast cancer tumour is benign (i.e., is not harmful and does not require treatment) or malignant (i.e., is harmful and requires treatment intervention). Our final classifier performed fairly well on an unseen test data set, with the F2 score, where beta = 2, of 0.96 and an overall accuracy calculated to be 0.96. On the 171 test data cases, it correctly predicted 168. However it incorrectly predicted 3 cases, and importantly these cases were false negatives; predicting that a tumour is benign when in fact it is malignant. These kind of incorrect predictions could have a severly negative impact on a patients health outcome, thus we recommend continuing study to improve this prediction model before it is put into production in the clinic.

The data set that was used in this project is of digitized breast cancer image features created by Dr. William H. Wolberg, W. Nick Street, and Olvi L. Mangasarian at the University of Wisconsin, Madison (Street, Wolberg, and Mangasarian 1993). It was sourced from the UCI Machine Learning Repository (Dua and Graff 2017) and can be found here, specifically this file. Each row in the data set represents summary statistics from measurements of an image of a tumour sample, including the diagnosis (benign or malignant) and several other measurements (e.g., nucleus texture, perimeter, area, etc.). Diagnosis for each image was conducted by physicians.

Report

The final report can be found here.

Dependencies

Docker is a container solution used to manage the software dependencies for this project. The Docker image used for this project is based on the quay.io/jupyter/minimal-notebook:notebook-7.0.6 image. Additional dependencies are specified int the Dockerfile.

Usage

Follow the instructions below to reproduce the analysis.

Setup

Install and launch Docker on your computer.
Clone this GitHub repository.

Running the analysis

Navigate to the root of this project on your computer using the command line and enter the following command to reset the project to a clean state (i.e., remove all files generated by previous runs of the analysis):

docker-compose run --rm analysis-env make clean

To run the analysis in its entirety, enter the following command in the terminal in the project root:

docker-compose run --rm analysis-env make all

Developer notes

Working with the project in the container using Jupyter lab

Navigate to the root of this project on your computer using the command line and enter the following command:

docker compose up analysis-env

In the terminal, look for a URL that starts with http://127.0.0.1:8888/lab?token= (for an example, see the highlighted text in the terminal below). Copy and paste that URL into your browser.

You should now see the Jupyter lab IDE in your browser, with all the project files visible in the file browser pane on the left side of the screen.

Clean up

To shut down the container and clean up the resources, type Cntrl + C in the terminal where you launched the container, and then type docker compose rm

Working with the project in the container using VSCode

Note if you prefer to work in VS Code, you can run the following from the root of the project in a terminal in VS Code to launch the container in the terminal there:

docker compose run --rm terminal bash

To exit the container type exit in the terminal.

Adding a new dependency

Add the dependency to the Dockerfile file on a new branch.
Re-build the Docker image locally to ensure it builds and runs properly.
Push the changes to GitHub. A new Docker image will be built and pushed to Docker Hub automatically. It will be tagged with the SHA for the commit that changed the file.
Update the docker-compose.yml file on your branch to use the new container image (make sure to update the tag specifically).
Send a pull request to merge the changes into the main branch.

Running the tests

Tests are run using the pytest command in the root of the project. More details about the test suite can be found in the tests directory.

License

The Breast Cancer Predictor report contained herein are licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information. . If re-using/re-mixing please provide attribution and link to this webpage. The software code contained within this repository is licensed under the MIT license. See the license file for more information.

References

Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.

Street, W. Nick, W. H. Wolberg, and O. L. Mangasarian. 1993. “Nuclear feature extraction for breast tumor diagnosis.” In Biomedical Image Processing and Biomedical Visualization, edited by Raj S. Acharya and Dmitry B. Goldgof, 1905:861–70. International Society for Optics; Photonics; SPIE. https://doi.org/10.1117/12.148698.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
data		data
docs		docs
img		img
report		report
results		results
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breast Cancer Predictor

About

Report

Dependencies

Usage

Setup

Running the analysis

Developer notes

Working with the project in the container using Jupyter lab

Clean up

Working with the project in the container using VSCode

Adding a new dependency

Running the tests

License

References

About

Releases

Packages

Languages

License

FixML/breast_cancer_predictor_py

Folders and files

Latest commit

History

Repository files navigation

Breast Cancer Predictor

About

Report

Dependencies

Usage

Setup

Running the analysis

Developer notes

Working with the project in the container using Jupyter lab

Clean up

Working with the project in the container using VSCode

Adding a new dependency

Running the tests

License

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages