Add preprocess MLCube

davidjurado · davidjurado · commit 6cfe2dcf02e9 · 2022-02-25T12:01:27.000-05:00
diff --git a/brats/metrics/mlcube/mlcube.yaml b/brats/metrics/mlcube/mlcube.yaml
@@ -18,5 +18,9 @@ tasks:
   evaluate:
   # Executes a number of metrics specified by the params file
     parameters:
-      inputs: {predictions: data/predictions/, ground_truth: data/ground_truth/, parameters_file: parameters.yaml}
+      inputs: {
+        predictions: data/predictions/,
+        ground_truth: data/ground_truth/,
+        parameters_file: parameters.yaml
+      }
       outputs: {output_path: {type: "file", default: "results.yaml"}}
diff --git a/brats/metrics/project/metrics.py b/brats/metrics/project/metrics.py
@@ -2,7 +2,6 @@
 import argparse
 import glob
 import yaml
-from pkgutil import get_data
 import nibabel as nib
 import numpy as np
 
diff --git a/brats/preprocessing/.gitignore b/brats/preprocessing/.gitignore
@@ -0,0 +1,2 @@
+__pycache__/
+mlcube/workspace/results
diff --git a/brats/preprocessing/README.md b/brats/preprocessing/README.md
@@ -0,0 +1,101 @@
+# BraTS Challenge - MLCube integration - preprocess
+
+Original implementation: ["BraTS Instructions Repo"](https://github.com/BraTS/Instructions)
+
+## Dataset
+
+Please refer to the [BraTS challenge page](http://braintumorsegmentation.org/) and follow the instructions in the data section.
+
+## Project setup
+
+```bash
+# Create Python environment and install MLCube Docker runner 
+virtualenv -p python3 ./env && source ./env/bin/activate && pip install mlcube-docker
+
+# Fetch the boston housing example from GitHub
+git clone https://github.com/mlcommons/mlcube_examples && cd ./mlcube_examples
+git fetch origin pull/39/head:feature/brats && git checkout feature/brats
+cd ./brats/preprocessing/mlcube
+```
+
+## Important files
+
+These are the most important files on this project:
+
+```bash
+
+├── mlcube
+│   ├── mlcube.yaml                             # MLCube configuration file, it defines the project, author, platform, docker and tasks.
+│   └── workspace
+│       ├── data
+│       │   └── BraTS_example_seg.nii.gz        # Input data
+│       ├── results
+│       │   └── output.npy                      # Output processed data
+│       ├── parameters.yaml
+└── project
+    ├── Dockerfile                              # Docker file with instructions to create the image for the project.
+    ├── preprocess.py                           # Python file that contains the main logic of the project.
+    ├── mlcube.py                               # Python entrypoint used by MLCube, contains the logic for MLCube tasks.
+    └── requirements.txt                        # Python requirements needed to run the project inside Docker.
+    └── run.sh                                  # Bash file containing logic to call preprocess.py script.
+```
+
+## How to modify this project
+
+You can change each file described above in order to add your own implementation.
+
+### Requirements file
+
+In this file (`requirements.txt`) you can add all the python dependencies needed for running your implementation, these dependencies will be installed during the creation of the docker image, this happens when you run the ```mlcube run ...``` command.
+
+### Dockerfile
+
+You can use both, CPU or GPU version for the dockerfile (`Dockerfile_CPU`, `Dockerfile_GPU`), also, you can add or modify any steps inside the file, this comes handy when you need to install some OS dependencies or even when you want to change the base docker image, inside the file you can find some information about the existing steps.
+
+### Parameters file
+
+This is a yaml file (`parameters.yaml`)that contains all extra parameters that aren't files or directories, for example, here you can place all the hyperparameters that you will use for training a model. This file will be passed as an **input parameter** in the MLCube tasks and then it will be read inside the MLCube container.
+
+### MLCube yaml file
+
+In this file (`mlcube.yaml`) you can find the instructions about the docker image and platform that will be used, information about the project (name, description, authors), and also the tasks defined for the project.
+
+In the existing implementation you will find 1 task:
+
+* evaluate:
+
+    This task takes the following parameters:
+
+  * Input parameters:
+    * predictions: Folder path containing predictions
+    * ground_truth: Folder path containing ground truth data
+    * parameters_file: Extra parameters
+  * Output parameters:
+    * output_path: File path where output preprocess will be stored
+
+    This task takes the input predictions and ground truth data, perform the evaluation and then save the output result in the output_path.
+
+### MLCube python file
+
+The `mlcube.py` file is the handler file and entrypoint described in the dockerfile, here you can find all the logic related to how to process each MLCube task. If you want to add a new task first you must define it inside the `mlcube.yaml` file with its input and output parameters and then you need to add the logic to handle this new task inside the `mlcube.py` file.
+
+### Preprocess file
+
+The `preprocess.py` file contains the main logic of the project, you can modify this file and write your implementation here to perform the different preprocessing steps, this preprocess file is called from the `run.sh` file and there are other ways to link your implementation and shown in the [MLCube examples repo](https://github.com/mlcommons/mlcube_examples).
+
+### Run bash file
+
+The `run.sh` file is called from `mlcube.py` and it receives the arguments, here we can perform different steps to then call the `preprocess.py` script.
+
+## Tasks execution
+
+```bash
+# Run preprocess task.
+mlcube run --mlcube=mlcube_cpu.yaml --task=preprocess
+```
+
+We are targeting pull-type installation, so MLCube images should be available on Docker Hub. If not, try this:
+
+```Bash
+mlcube run ... -Pdocker.build_strategy=always
+```
diff --git a/brats/preprocessing/mlcube/mlcube.yaml b/brats/preprocessing/mlcube/mlcube.yaml
@@ -0,0 +1,22 @@
+name: MLCommons Brats preprocessing
+description: MLCommons Brats integration for preprocessing
+authors: 
+ - {name: "MLCommons Best Practices Working Group"}
+
+platform:
+  accelerator_count: 0
+
+docker:
+  # Image name.
+  image: mlcommons/brats_preprocessing:0.0.1
+  # Docker build context relative to $MLCUBE_ROOT. Default is `build`.
+  build_context: "../project"
+  # Docker file name within docker build context, default is `Dockerfile`.
+  build_file: "Dockerfile"
+
+tasks:
+  preprocess:
+  # Run preprocessing
+    parameters:
+      inputs: {data_path: data/, parameters_file: parameters.yaml}
+      outputs: {output_path: results/}
diff --git a/brats/preprocessing/mlcube/workspace/data/BraTS_example_seg.nii.gz b/brats/preprocessing/mlcube/workspace/data/BraTS_example_seg.nii.gz
diff --git a/brats/preprocessing/mlcube/workspace/parameters.yaml b/brats/preprocessing/mlcube/workspace/parameters.yaml
@@ -0,0 +1 @@
+output_filename: "output.npy"
diff --git a/brats/preprocessing/project/Dockerfile b/brats/preprocessing/project/Dockerfile
@@ -0,0 +1,24 @@
+# for a CPU app use this Dockerfile.
+FROM python:3.8-buster
+
+# fill in your info here
+LABEL author="chuck@norris.org"
+LABEL application="your application name"
+LABEL maintainer="chuck@norris.org"
+LABEL version="0.0.1"
+LABEL status="beta"
+
+# basic
+RUN apt-get -y update && apt -y full-upgrade && apt-get -y install apt-utils wget git tar build-essential curl nano
+
+# install all python requirements
+WORKDIR /app
+COPY ./requirements.txt ./requirements.txt
+RUN pip3 install -r requirements.txt
+
+# copy all files
+COPY ./ ./
+
+RUN chmod +x ./run.sh
+
+ENTRYPOINT [ "python3", "mlcube.py"]
diff --git a/brats/preprocessing/project/mlcube.py b/brats/preprocessing/project/mlcube.py
@@ -0,0 +1,44 @@
+"""MLCube handler file"""
+import os
+import typer
+import subprocess
+
+
+app = typer.Typer()
+
+
+class PreprocessTask:
+    """Runs preprocessing given the input data path"""
+
+    @staticmethod
+    def run(
+        data_path: str, parameters_file: str, output_path: str
+    ) -> None:
+
+        env = os.environ.copy()
+        env.update({
+            'data_path': data_path,
+            'parameters_file': parameters_file,
+            'output_path': output_path
+        })
+
+        process = subprocess.Popen("./run.sh", cwd=".", env=env)
+        process.wait()
+
+
+@app.command("preprocess")
+def preprocess(
+    data_path: str = typer.Option(..., "--data_path"),
+    parameters_file: str = typer.Option(..., "--parameters_file"),
+    output_path: str = typer.Option(..., "--output_path"),
+):
+    PreprocessTask.run(data_path, parameters_file, output_path)
+
+
+@app.command("test")
+def test():
+    pass
+
+
+if __name__ == "__main__":
+    app()
diff --git a/brats/preprocessing/project/preprocess.py b/brats/preprocessing/project/preprocess.py
@@ -0,0 +1,100 @@
+"""Metrics file"""
+import os
+import argparse
+import glob
+import yaml
+import numpy as np
+import nibabel as nib
+from tqdm import tqdm
+
+
+def preprocess(image: np.ndarray):
+    """Preprocess the image labels from a numpy array"""
+
+    image_WT = image.copy()
+    image_WT[image_WT == 1] = 1
+    image_WT[image_WT == 2] = 1
+    image_WT[image_WT == 4] = 1
+
+    image_TC = image.copy()
+    image_TC[image_TC == 1] = 1
+    image_TC[image_TC == 2] = 0
+    image_TC[image_TC == 4] = 1
+
+    image_ET = image.copy()
+    image_ET[image_ET == 1] = 0
+    image_ET[image_ET == 2] = 0
+    image_ET[image_ET == 4] = 1
+
+    image = np.stack([image_WT, image_TC, image_ET])
+    image = np.moveaxis(image, (0, 1, 2, 3), (0, 3, 2, 1))
+
+    return image
+
+
+def load_img(file_path):
+    """Reads segmentations image as a numpy array"""
+
+    data = nib.load(file_path)
+    data = np.asarray(data.dataobj)
+    return data
+
+
+def get_data_arr(data_path):
+    """Reads the content for the data path folder
+    and then returns the data in numpy array format"""
+
+    image_path_list = glob.glob(data_path + "/*")
+    images_arr = []
+    for image_path in image_path_list:
+        image = load_img(image_path)
+        image = preprocess(image)
+        images_arr.append(image)
+    images_arr = np.concatenate(images_arr)
+    return images_arr
+
+
+def save_processed_data(output_path, output_filename, images_arr):
+    """Writes processed images to the target output path"""
+    output_file_path = os.path.join(output_path, output_filename)
+    with open(output_file_path, 'wb') as f:
+        np.save(f, images_arr)
+    print("File correctly saved!")
+
+
+def main():
+    """Main function that recieves input data and preprocess it"""
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--data_path",
+        "--data-path",
+        type=str,
+        required=True,
+        help="Directory containing input data",
+    )
+    parser.add_argument(
+        "--output_path",
+        "--output-path",
+        type=str,
+        required=True,
+        help="Path where output data will be stored",
+    )
+    parser.add_argument(
+        "--parameters_file",
+        "--parameters-file",
+        type=str,
+        required=True,
+        help="File containing parameters for processing",
+    )
+    args = parser.parse_args()
+
+    with open(args.parameters_file, "r") as f:
+        params = yaml.full_load(f)
+
+    images_arr = get_data_arr(args.data_path)
+    save_processed_data(args.output_path, params["output_filename"], images_arr)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/brats/preprocessing/project/requirements.txt b/brats/preprocessing/project/requirements.txt
@@ -0,0 +1,5 @@
+PyYAML
+typer
+numpy
+nibabel
+tqdm
diff --git a/brats/preprocessing/project/run.sh b/brats/preprocessing/project/run.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+set -e
+
+: ${data_path:=${1:-}}
+: ${parameters_file:=${2:-}}
+: ${output_path:=${2:-}}
+
+ARGS="--data_path=$data_path"
+ARGS+=" --parameters_file $parameters_file"
+ARGS+=" --output_path $output_path"
+
+# Execute command and time it
+echo Processing data. This may take a while...
+time python3 preprocess.py ${ARGS}

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+__pycache__/`
	`2`	`+mlcube/workspace/results`