From e1beb377bb01f9f32b2690c3984d27947d3e1afe Mon Sep 17 00:00:00 2001 From: Mackenzie-OO7 Date: Fri, 24 Feb 2023 12:01:19 +0100 Subject: [PATCH 1/2] document scatterMethod --- src/topics/workflows.md | 119 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 114 insertions(+), 5 deletions(-) diff --git a/src/topics/workflows.md b/src/topics/workflows.md index 505d6191..5248345f 100644 --- a/src/topics/workflows.md +++ b/src/topics/workflows.md @@ -402,6 +402,7 @@ This feature tells the runner that you wish to run a tool or workflow multiple t of inputs. The workflow then takes the input(s) as an array and will run the specified step(s) on each element of the array as if it were a single input. This allows you to run the same workflow on multiple inputs without having to generate many different commands or input yaml files. +To use `scatter`, `ScatterFeatureRequirement` must be specified in the workflow or workflow step requirements. ```cwl requirements: @@ -439,13 +440,12 @@ steps: ``` Here we've added a new field to the step `echo` called `scatter`. This field tells the -runner that we'd like to scatter over this input for this particular step. Note that +runner that we'd like to scatter over this input for this particular step. An input parameter may be listed more than once, if a parameter is listed more than once, it becomes +a nested array. As a result, upstream parameters which are connected to the +scattered parameters must be arrays. Note that the input name listed after scatter is the one of the step's input, not a workflow level input. -For our first scatter, it's as simple as that! Since our tool doesn't collect any outputs, we -still use `outputs: []` in our workflow, but if you expect that the final output of your -workflow will now have multiple outputs to collect, be sure to update that to an array type -as well! +For our first scatter, it's as simple as that! Each job in the scatter results in an entry in the output array because all output parameter types are also implicitly wrapped in arrays. Since our tool doesn't collect any outputs, we still use `outputs: []` in our workflow, but if you expect that the final output of your workflow will now have multiple outputs to collect, be sure to update that to an array type as well! Using the following input file: @@ -519,6 +519,115 @@ two-step workflow to a single step subworkflow: Now the scatter acts on a single step, but that step consists of two steps so each step is performed in parallel. +If `scatter` declares more than one input parameter, `scatterMethod` +describes how to divide the inputs into separate jobs. There are 3 scatter methods in CWL: `dot_product`, `flat_crossproduct`, and `nested_crossproduct` + +`dotproduct` specifies that each of the input arrays are aligned and one +element taken from each array to construct each job. It is an error +if all input arrays are not the same length. + +```cwl +#!/usr/bin/env cwl-runner +cwlVersion: v1.2 +class: Workflow + +requirements: + ScatterFeatureRequirement: {} + +inputs: + message_file: File[] + message_array: string[] + +outputs: + output_array: + type: File[] + outputSource: step1/output + +steps: + step1: + run: example.cwl + scatter: [input_file, input_array] + scatterMethod: dotproduct + in: + input_file: message_file + input_array: message_array + out: [output] + ``` + +`nested_crossproduct` specifies the Cartesian product of the inputs, +producing a job for every combination of the scattered inputs. The +output must be nested arrays for each level of scattering, in the +order that the input arrays are listed in the `scatter` field. + +```cwl +#!/usr/bin/env cwl-runner +cwlVersion: v1.2 +class: Workflow + +requirements: + ScatterFeatureRequirement: {} + +inputs: + message_file: File[] + message_array: string[] + +outputs: + output_array: + type: + type: array + items: + type: array + items: File + outputSource: step1/output + +steps: + step1: + run: example.cwl + scatter: [input_file, input_array] + scatterMethod: nested_crossproduct + in: + input_file: message_file + input_array: message_array + out: [output] +``` + +`flat_crossproduct` specifies the Cartesian product of the inputs, +producing a job for every combination of the scattered inputs. The +output arrays must be flattened to a single level, but otherwise listed in the +order that the input arrays are listed in the `scatter` field. + +```cwl +#!/usr/bin/env cwl-runner +cwlVersion: v1.2 +class: Workflow + +requirements: + ScatterFeatureRequirement: {} + +inputs: + message_file: File[] + message_array: string[] + +outputs: + output_array: + type: + type: array + items: + type: array + items: File + outputSource: step1/output + +steps: + step1: + run: example.cwl + scatter: [input_file, input_array] + scatterMethod: flat_crossproduct + in: + input_file: message_file + input_array: message_array + out: [output] +``` + ## Conditional Workflows This workflow contains a conditional step and is executed based on the input. From a3ec20f325a76d8db37f79800c718f071f76a330 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Levai=20Mackenzie=20=C3=81gb=C3=A0r=C3=A0?= <97461848+Mackenzie-OO7@users.noreply.github.com> Date: Sat, 4 Mar 2023 03:10:12 +0100 Subject: [PATCH 2/2] document scatterMethod --- src/topics/workflows.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/topics/workflows.md b/src/topics/workflows.md index 5248345f..429cdba9 100644 --- a/src/topics/workflows.md +++ b/src/topics/workflows.md @@ -519,12 +519,13 @@ two-step workflow to a single step subworkflow: Now the scatter acts on a single step, but that step consists of two steps so each step is performed in parallel. -If `scatter` declares more than one input parameter, `scatterMethod` -describes how to divide the inputs into separate jobs. There are 3 scatter methods in CWL: `dot_product`, `flat_crossproduct`, and `nested_crossproduct` +If `scatter` declares more than one input parameter, +`scatterMethod` describes how to divide the inputs into separate jobs. +There are 3 scatter methods in CWL: `dot_product`, `flat_crossproduct`, and `nested_crossproduct`. -`dotproduct` specifies that each of the input arrays are aligned and one -element taken from each array to construct each job. It is an error -if all input arrays are not the same length. +`dotproduct` specifies that each of the input arrays are aligned, +and one element taken from each array to construct each job. +It is an error if all input arrays are not the same length. ```cwl #!/usr/bin/env cwl-runner