Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pipeline overview documentation #2186

Closed
grst opened this issue Feb 19, 2023 · 2 comments · Fixed by #2232
Closed

Improve pipeline overview documentation #2186

grst opened this issue Feb 19, 2023 · 2 comments · Fixed by #2232

Comments

@grst
Copy link
Member

grst commented Feb 19, 2023

Description of feature

When looking at the documentation of a new nf-core pipeline, mostly care about two things:

  1. What does it do (what kind of input data, which tools, output data)
  2. How do I run it (minimal example)

I find it surprisingly hard to quickly find this information for a nf-core pipeline. I sometimes prefer reading the code over reading the documentation.

Here's the documentation website how I perceive it:

(Using RNA-seq as an example, but this applies more generally)

219959565-461a0e0d-0a29-43ad-bcc3-46268f0dbfa5

I know there's the usage the parameters sections, which are great, but I'm really missing a quick "getting started" overview for most pipelines.

Alternative suggestion:

(The docs of the rnaseq workflow rewritten, moving boilerplate text into external, linked documents)

Pipeline summary

nf-core/rnaseq is a bioinformatics pipeline that can be used to analyse RNA sequencing data obtained from organisms with a reference genome and annotation. It takes a samplesheet and FASTQ files as input, performs QC, trimming and (pseudo-)alignment, and produces a gene expression matrix and extensive QC report.

[subway map and tools as is]

Usage

If you are new to nextflow and nf-core, please refer to [central documentation page about setting up nextflow] on how to set-up nextflow.

First, you need to prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2,strandedness
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,auto
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,auto
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,auto

Each row represents fastq file (single-end) or a pair of fastq files (paired end). Rows with the same sample identifier are considered technical replicates and merged automatically.

Now, you can run the pipeline using:

nextflow run nf-core/rnaseq \
    --input samplesheet.csv \
    --outdir <OUTDIR> \
    --genome GRCh37 \
    -profile <docker/singularity/.../institute>  

For more details, please refer to the usage documentation and the parameter documentation.

Pipeline output

The output of the pipeline applied on an example dataset can be found here [link to AWS test].
For more details, please refer to the output documentation.

Online Videos

[current content as is]

Credits/Support/Citations

[current content as is]

@ewels
Copy link
Member

ewels commented Feb 19, 2023

Love it 👍🏻 Maybe even docs with guidelines too? Basically the same as above.

grst added a commit to nf-core/rnaseq that referenced this issue Feb 20, 2023
drpatelh added a commit to nf-core/rnaseq that referenced this issue Mar 23, 2023
@grst grst mentioned this issue Apr 3, 2023
2 tasks
@mirpedrol mirpedrol added this to the 2.8 milestone Apr 25, 2023
@mirpedrol
Copy link
Member

done in #2232

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants