Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional checks of sample sheet #623

Merged
merged 5 commits into from
Jul 8, 2022

Conversation

asp8200
Copy link
Contributor

@asp8200 asp8200 commented Jul 7, 2022

Related to #93

With this PR, I've tried adding checks for the following:

  1. The sample sheet should not contain several rows with the same combination of patient, sample and lane, and
  2. A sample should not be associated to several patients.

I have some ideas or questions for additional checks, but let's first see if the two above-mentioned checks are a step in the right direction.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented Jul 7, 2022

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 337adea

+| ✅ 145 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗   2 tests had warnings |!

❗ Test warnings:

  • readme - README did not have a Nextflow minimum version badge.
  • schema_description - No description provided in schema for parameter: ascat_chromosomes

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy

✅ Tests passed:

Run details

  • nf-core/tools version 2.4.1
  • Run at 2022-07-08 11:59:27

workflows/sarek.nf Outdated Show resolved Hide resolved
@maxulysse maxulysse changed the title DRAFT: Additional checks of sample sheet Additional checks of sample sheet Jul 8, 2022
@maxulysse maxulysse marked this pull request as ready for review July 8, 2022 07:05
@maxulysse
Copy link
Member

Just need to update the CHANGELOG and we're good to go

@asp8200
Copy link
Contributor Author

asp8200 commented Jul 8, 2022

Just need to update the CHANGELOG and we're good to go

Sound good 🙏 Should we go even further and add some checks like:

A) Each sample should correspond to a unique set of fastq-files (see example below),

B) Each fastq-file should only be associated with one sample.

The purpose of (A) is to detect an error in the sample sheet like this:

patient,sex,sample,fastq_1,fastq_2
p1, XX,s1, f1_R1.fastq.gz, f1_R2.fastq.gz
p2, XX,s1, f2_R1.fastq.gz, f2_R2.fastq.gz 

(Here the s1 in the last row should have been s2.)
Similar, the purpose of (B) is to detect an error in the sample sheet like this:

patient,sex,sample,fastq_1,fastq_2
p1, XX,s1, f1_R1.fastq.gz, f1_R2.fastq.gz
p2, XX,s2, f1_R1.fastq.gz, f2_R2.fastq.gz 

(Here the f1_R1.fastq.gz in the last row should have been f2_R1.fastq.gz.)

Let me know what you think. Cheers

@maxulysse
Copy link
Member

what about the lane information in these cases?

@asp8200
Copy link
Contributor Author

asp8200 commented Jul 8, 2022

what about the lane information in these cases?

These two new proposed checks don't involve the lanes, but perhaps they should? Or other checks should be made involving the lanes? I'm not sure what one would want to check for the lane-data, perhaps you know?

Copy link
Contributor

@lassefolkersen lassefolkersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice to have patient-sample-lane checked for uniqueness.
Just wanted to pipe in that I often run the pipeline with the same fastq-file for different patients, or somatic/germline or such. It comes out with no differences, but it actually runs. I wouldn't be able to run such test-setups if you disallow same fastq. Maybe it's more of a log.warning thing?

@asp8200
Copy link
Contributor Author

asp8200 commented Jul 8, 2022

Very nice to have patient-sample-lane checked for uniqueness. Just wanted to pipe in that I often run the pipeline with the same fastq-file for different patients, or somatic/germline or such. It comes out with no differences, but it actually runs. I wouldn't be able to run such test-setups if you disallow same fastq. Maybe it's more of a log.warning thing?

@lassefolkersen and @maxulysse : So maybe we should just keep it to those checks which are implemented in this PR?

@maxulysse
Copy link
Member

Very nice to have patient-sample-lane checked for uniqueness. Just wanted to pipe in that I often run the pipeline with the same fastq-file for different patients, or somatic/germline or such. It comes out with no differences, but it actually runs. I wouldn't be able to run such test-setups if you disallow same fastq. Maybe it's more of a log.warning thing?

@lassefolkersen and @maxulysse : So maybe we should just keep it to those checks which are implemented in this PR?

Yes, let's keep it that way for now, I feel like it's good and enough already

@maxulysse maxulysse merged commit 7383196 into nf-core:dev Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants