From d09e5c12b2651a9acbdf3951d4eb360566512d3f Mon Sep 17 00:00:00 2001 From: Kaushik Ghose <6299530+kaushik-work@users.noreply.github.com> Date: Wed, 17 Oct 2018 10:56:32 -0400 Subject: [PATCH 1/3] Reproducible and portable workflows! Create hierarchy of recommendations for reproducible and portable workflows --- _extras/recommended-practices.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/_extras/recommended-practices.md b/_extras/recommended-practices.md index 6d6dd3c4..6db9e6a2 100644 --- a/_extras/recommended-practices.md +++ b/_extras/recommended-practices.md @@ -6,6 +6,12 @@ permalink: /rec-practices/ Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language description for a tool or workflow. These guidelines are presented for consideration on a scale of usefulness: more is better, not all are required. +☐ Reproducibility and Portability are essential goals of scientific workflow developers. + +- The best way to ensure portability and reproducibility is to rigidly specify the exact environment a tool should run in. Currently a linux image (commonly called a `Docker image`), packaging the exact environment intended by the developer, is the best way to distribute a tool executable. Use `DockerPull` to specify the image. Use an image identifier that is resilient to updates to the container. +- If this is not possible, carefully specifying software tools and dependencies using `SoftwareRequirement` is the next best resort. Be aware that changes in the tool repositories the tools are being pulled from may silently change the behavior of the tool at each run. +- Not specifying a docker image or software requirements will result in a non-reproducible, non-portable workflow! + ☐ No `type: string` parameters for names of input or reference files/directories; use `type: File` or `type: Directory` as appropriate. ☐ Include a license that allows for re-use by anyone, e.g. [Apache 2.0][apache-license]. If possible, the license should be specified with its corresponding [SPDX identifier][spdx]. Construct the metadata field for the licence by providing a URL of the form `https://spdx.org/licenses/[SPDX-ID]` where `SPDX-ID` is the taken from the list of identifiers linked above. See the example snippet below for guidance. For non-standard licenses without an SPDX identifier, provide a URL to the license. From 3600f2d88f91a0148b005d65f264eb2dcc4ac2e9 Mon Sep 17 00:00:00 2001 From: Kaushik Ghose <6299530+kaushik-work@users.noreply.github.com> Date: Thu, 18 Oct 2018 16:06:53 -0400 Subject: [PATCH 2/3] Add hard line breaks --- _extras/recommended-practices.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/_extras/recommended-practices.md b/_extras/recommended-practices.md index 6db9e6a2..f098cf29 100644 --- a/_extras/recommended-practices.md +++ b/_extras/recommended-practices.md @@ -4,12 +4,19 @@ title: "Recommended Practices" permalink: /rec-practices/ --- -Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language description for a tool or workflow. These guidelines are presented for consideration on a scale of usefulness: more is better, not all are required. +Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language +description for a tool or workflow. These guidelines are presented for consideration on a scale of +usefulness: more is better, not all are required. ☐ Reproducibility and Portability are essential goals of scientific workflow developers. -- The best way to ensure portability and reproducibility is to rigidly specify the exact environment a tool should run in. Currently a linux image (commonly called a `Docker image`), packaging the exact environment intended by the developer, is the best way to distribute a tool executable. Use `DockerPull` to specify the image. Use an image identifier that is resilient to updates to the container. -- If this is not possible, carefully specifying software tools and dependencies using `SoftwareRequirement` is the next best resort. Be aware that changes in the tool repositories the tools are being pulled from may silently change the behavior of the tool at each run. +- Currently (2018) the best way to ensure portability and reproducibility is to rigidly specify the exact environment +a tool should run in. Currently a linux image (commonly called a `Docker image`), packaging the exact +environment intended by the developer, is the best way to distribute a tool executable. Use `DockerPull` +to specify the image. Use an image identifier that is resilient to updates to the container. +- If this is not possible, carefully specifying software tools and dependencies using `SoftwareRequirement` +is the next best resort. Be aware that changes in the tool repositories the tools are being pulled from +may silently change the behavior of the tool at each run. - Not specifying a docker image or software requirements will result in a non-reproducible, non-portable workflow! ☐ No `type: string` parameters for names of input or reference files/directories; use `type: File` or `type: Directory` as appropriate. From 57e9b35b9cc8e657397eb6c7567ab33d1d0b96f2 Mon Sep 17 00:00:00 2001 From: Kaushik Ghose <6299530+kaushik-work@users.noreply.github.com> Date: Thu, 18 Oct 2018 16:24:04 -0400 Subject: [PATCH 3/3] Clarify language around images Note that docker images are the current best solution for the software environment reproducibility issue --- _extras/recommended-practices.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/_extras/recommended-practices.md b/_extras/recommended-practices.md index f098cf29..954ae4fa 100644 --- a/_extras/recommended-practices.md +++ b/_extras/recommended-practices.md @@ -10,14 +10,17 @@ usefulness: more is better, not all are required. ☐ Reproducibility and Portability are essential goals of scientific workflow developers. -- Currently (2018) the best way to ensure portability and reproducibility is to rigidly specify the exact environment -a tool should run in. Currently a linux image (commonly called a `Docker image`), packaging the exact -environment intended by the developer, is the best way to distribute a tool executable. Use `DockerPull` -to specify the image. Use an image identifier that is resilient to updates to the container. +Ideally a workflow developer would be able to rigidly specify the software and hardware environment a tool should run in +to ensure portability and reproducibility. + +- Currently (2018) the best way approach this ideal is to package the exact software environment in an image +(such as a `Docker Image`) and specify the image via the `DockerPull` field. Use an image identifier that is +resilient to updates to the container. - If this is not possible, carefully specifying software tools and dependencies using `SoftwareRequirement` is the next best resort. Be aware that changes in the tool repositories the tools are being pulled from may silently change the behavior of the tool at each run. - Not specifying a docker image or software requirements will result in a non-reproducible, non-portable workflow! +- Do specify CPU and memory requirements where required ☐ No `type: string` parameters for names of input or reference files/directories; use `type: File` or `type: Directory` as appropriate.