Skip to content

Commit 6cfe2dc

Browse files
committed
Add preprocess MLCube
1 parent 02fe758 commit 6cfe2dc

File tree

12 files changed

+319
-2
lines changed

12 files changed

+319
-2
lines changed

brats/metrics/mlcube/mlcube.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,9 @@ tasks:
1818
evaluate:
1919
# Executes a number of metrics specified by the params file
2020
parameters:
21-
inputs: {predictions: data/predictions/, ground_truth: data/ground_truth/, parameters_file: parameters.yaml}
21+
inputs: {
22+
predictions: data/predictions/,
23+
ground_truth: data/ground_truth/,
24+
parameters_file: parameters.yaml
25+
}
2226
outputs: {output_path: {type: "file", default: "results.yaml"}}

brats/metrics/project/metrics.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
import argparse
33
import glob
44
import yaml
5-
from pkgutil import get_data
65
import nibabel as nib
76
import numpy as np
87

brats/preprocessing/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
__pycache__/
2+
mlcube/workspace/results

brats/preprocessing/README.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# BraTS Challenge - MLCube integration - preprocess
2+
3+
Original implementation: ["BraTS Instructions Repo"](https://github.com/BraTS/Instructions)
4+
5+
## Dataset
6+
7+
Please refer to the [BraTS challenge page](http://braintumorsegmentation.org/) and follow the instructions in the data section.
8+
9+
## Project setup
10+
11+
```bash
12+
# Create Python environment and install MLCube Docker runner
13+
virtualenv -p python3 ./env && source ./env/bin/activate && pip install mlcube-docker
14+
15+
# Fetch the boston housing example from GitHub
16+
git clone https://github.com/mlcommons/mlcube_examples && cd ./mlcube_examples
17+
git fetch origin pull/39/head:feature/brats && git checkout feature/brats
18+
cd ./brats/preprocessing/mlcube
19+
```
20+
21+
## Important files
22+
23+
These are the most important files on this project:
24+
25+
```bash
26+
27+
├── mlcube
28+
│ ├── mlcube.yaml # MLCube configuration file, it defines the project, author, platform, docker and tasks.
29+
│ └── workspace
30+
│ ├── data
31+
│ │ └── BraTS_example_seg.nii.gz # Input data
32+
│ ├── results
33+
│ │ └── output.npy # Output processed data
34+
│ ├── parameters.yaml
35+
└── project
36+
├── Dockerfile # Docker file with instructions to create the image for the project.
37+
├── preprocess.py # Python file that contains the main logic of the project.
38+
├── mlcube.py # Python entrypoint used by MLCube, contains the logic for MLCube tasks.
39+
└── requirements.txt # Python requirements needed to run the project inside Docker.
40+
└── run.sh # Bash file containing logic to call preprocess.py script.
41+
```
42+
43+
## How to modify this project
44+
45+
You can change each file described above in order to add your own implementation.
46+
47+
### Requirements file
48+
49+
In this file (`requirements.txt`) you can add all the python dependencies needed for running your implementation, these dependencies will be installed during the creation of the docker image, this happens when you run the ```mlcube run ...``` command.
50+
51+
### Dockerfile
52+
53+
You can use both, CPU or GPU version for the dockerfile (`Dockerfile_CPU`, `Dockerfile_GPU`), also, you can add or modify any steps inside the file, this comes handy when you need to install some OS dependencies or even when you want to change the base docker image, inside the file you can find some information about the existing steps.
54+
55+
### Parameters file
56+
57+
This is a yaml file (`parameters.yaml`)that contains all extra parameters that aren't files or directories, for example, here you can place all the hyperparameters that you will use for training a model. This file will be passed as an **input parameter** in the MLCube tasks and then it will be read inside the MLCube container.
58+
59+
### MLCube yaml file
60+
61+
In this file (`mlcube.yaml`) you can find the instructions about the docker image and platform that will be used, information about the project (name, description, authors), and also the tasks defined for the project.
62+
63+
In the existing implementation you will find 1 task:
64+
65+
* evaluate:
66+
67+
This task takes the following parameters:
68+
69+
* Input parameters:
70+
* predictions: Folder path containing predictions
71+
* ground_truth: Folder path containing ground truth data
72+
* parameters_file: Extra parameters
73+
* Output parameters:
74+
* output_path: File path where output preprocess will be stored
75+
76+
This task takes the input predictions and ground truth data, perform the evaluation and then save the output result in the output_path.
77+
78+
### MLCube python file
79+
80+
The `mlcube.py` file is the handler file and entrypoint described in the dockerfile, here you can find all the logic related to how to process each MLCube task. If you want to add a new task first you must define it inside the `mlcube.yaml` file with its input and output parameters and then you need to add the logic to handle this new task inside the `mlcube.py` file.
81+
82+
### Preprocess file
83+
84+
The `preprocess.py` file contains the main logic of the project, you can modify this file and write your implementation here to perform the different preprocessing steps, this preprocess file is called from the `run.sh` file and there are other ways to link your implementation and shown in the [MLCube examples repo](https://github.com/mlcommons/mlcube_examples).
85+
86+
### Run bash file
87+
88+
The `run.sh` file is called from `mlcube.py` and it receives the arguments, here we can perform different steps to then call the `preprocess.py` script.
89+
90+
## Tasks execution
91+
92+
```bash
93+
# Run preprocess task.
94+
mlcube run --mlcube=mlcube_cpu.yaml --task=preprocess
95+
```
96+
97+
We are targeting pull-type installation, so MLCube images should be available on Docker Hub. If not, try this:
98+
99+
```Bash
100+
mlcube run ... -Pdocker.build_strategy=always
101+
```
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
name: MLCommons Brats preprocessing
2+
description: MLCommons Brats integration for preprocessing
3+
authors:
4+
- {name: "MLCommons Best Practices Working Group"}
5+
6+
platform:
7+
accelerator_count: 0
8+
9+
docker:
10+
# Image name.
11+
image: mlcommons/brats_preprocessing:0.0.1
12+
# Docker build context relative to $MLCUBE_ROOT. Default is `build`.
13+
build_context: "../project"
14+
# Docker file name within docker build context, default is `Dockerfile`.
15+
build_file: "Dockerfile"
16+
17+
tasks:
18+
preprocess:
19+
# Run preprocessing
20+
parameters:
21+
inputs: {data_path: data/, parameters_file: parameters.yaml}
22+
outputs: {output_path: results/}
Binary file not shown.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
output_filename: "output.npy"
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# for a CPU app use this Dockerfile.
2+
FROM python:3.8-buster
3+
4+
# fill in your info here
5+
LABEL author="chuck@norris.org"
6+
LABEL application="your application name"
7+
LABEL maintainer="chuck@norris.org"
8+
LABEL version="0.0.1"
9+
LABEL status="beta"
10+
11+
# basic
12+
RUN apt-get -y update && apt -y full-upgrade && apt-get -y install apt-utils wget git tar build-essential curl nano
13+
14+
# install all python requirements
15+
WORKDIR /app
16+
COPY ./requirements.txt ./requirements.txt
17+
RUN pip3 install -r requirements.txt
18+
19+
# copy all files
20+
COPY ./ ./
21+
22+
RUN chmod +x ./run.sh
23+
24+
ENTRYPOINT [ "python3", "mlcube.py"]

brats/preprocessing/project/mlcube.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
"""MLCube handler file"""
2+
import os
3+
import typer
4+
import subprocess
5+
6+
7+
app = typer.Typer()
8+
9+
10+
class PreprocessTask:
11+
"""Runs preprocessing given the input data path"""
12+
13+
@staticmethod
14+
def run(
15+
data_path: str, parameters_file: str, output_path: str
16+
) -> None:
17+
18+
env = os.environ.copy()
19+
env.update({
20+
'data_path': data_path,
21+
'parameters_file': parameters_file,
22+
'output_path': output_path
23+
})
24+
25+
process = subprocess.Popen("./run.sh", cwd=".", env=env)
26+
process.wait()
27+
28+
29+
@app.command("preprocess")
30+
def preprocess(
31+
data_path: str = typer.Option(..., "--data_path"),
32+
parameters_file: str = typer.Option(..., "--parameters_file"),
33+
output_path: str = typer.Option(..., "--output_path"),
34+
):
35+
PreprocessTask.run(data_path, parameters_file, output_path)
36+
37+
38+
@app.command("test")
39+
def test():
40+
pass
41+
42+
43+
if __name__ == "__main__":
44+
app()
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
"""Metrics file"""
2+
import os
3+
import argparse
4+
import glob
5+
import yaml
6+
import numpy as np
7+
import nibabel as nib
8+
from tqdm import tqdm
9+
10+
11+
def preprocess(image: np.ndarray):
12+
"""Preprocess the image labels from a numpy array"""
13+
14+
image_WT = image.copy()
15+
image_WT[image_WT == 1] = 1
16+
image_WT[image_WT == 2] = 1
17+
image_WT[image_WT == 4] = 1
18+
19+
image_TC = image.copy()
20+
image_TC[image_TC == 1] = 1
21+
image_TC[image_TC == 2] = 0
22+
image_TC[image_TC == 4] = 1
23+
24+
image_ET = image.copy()
25+
image_ET[image_ET == 1] = 0
26+
image_ET[image_ET == 2] = 0
27+
image_ET[image_ET == 4] = 1
28+
29+
image = np.stack([image_WT, image_TC, image_ET])
30+
image = np.moveaxis(image, (0, 1, 2, 3), (0, 3, 2, 1))
31+
32+
return image
33+
34+
35+
def load_img(file_path):
36+
"""Reads segmentations image as a numpy array"""
37+
38+
data = nib.load(file_path)
39+
data = np.asarray(data.dataobj)
40+
return data
41+
42+
43+
def get_data_arr(data_path):
44+
"""Reads the content for the data path folder
45+
and then returns the data in numpy array format"""
46+
47+
image_path_list = glob.glob(data_path + "/*")
48+
images_arr = []
49+
for image_path in image_path_list:
50+
image = load_img(image_path)
51+
image = preprocess(image)
52+
images_arr.append(image)
53+
images_arr = np.concatenate(images_arr)
54+
return images_arr
55+
56+
57+
def save_processed_data(output_path, output_filename, images_arr):
58+
"""Writes processed images to the target output path"""
59+
output_file_path = os.path.join(output_path, output_filename)
60+
with open(output_file_path, 'wb') as f:
61+
np.save(f, images_arr)
62+
print("File correctly saved!")
63+
64+
65+
def main():
66+
"""Main function that recieves input data and preprocess it"""
67+
68+
parser = argparse.ArgumentParser()
69+
parser.add_argument(
70+
"--data_path",
71+
"--data-path",
72+
type=str,
73+
required=True,
74+
help="Directory containing input data",
75+
)
76+
parser.add_argument(
77+
"--output_path",
78+
"--output-path",
79+
type=str,
80+
required=True,
81+
help="Path where output data will be stored",
82+
)
83+
parser.add_argument(
84+
"--parameters_file",
85+
"--parameters-file",
86+
type=str,
87+
required=True,
88+
help="File containing parameters for processing",
89+
)
90+
args = parser.parse_args()
91+
92+
with open(args.parameters_file, "r") as f:
93+
params = yaml.full_load(f)
94+
95+
images_arr = get_data_arr(args.data_path)
96+
save_processed_data(args.output_path, params["output_filename"], images_arr)
97+
98+
99+
if __name__ == "__main__":
100+
main()

0 commit comments

Comments
 (0)