Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insert check for one line sample sheet #591

Merged
merged 6 commits into from
Jun 17, 2022

Conversation

lassefolkersen
Copy link
Contributor

@lassefolkersen lassefolkersen commented Jun 15, 2022

This fixes #589 by checking that sample sheet is at least two lines. If it's only 1 line, the subsequent splitCsv(header: true) call will read it as empty and skip all other samplesheet checks. Also - upgraded missing patient and sample cases from log.warn to log.error.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented Jun 15, 2022

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit d67fbbd

+| ✅ 144 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗   8 tests had warnings |!

❗ Test warnings:

  • readme - README did not have a Nextflow minimum version badge.
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • schema_description - No description provided in schema for parameter: umi_read_structure
  • schema_description - No description provided in schema for parameter: group_by_umi_strategy

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy

✅ Tests passed:

Run details

  • nf-core/tools version 2.4.1
  • Run at 2022-06-17 07:09:05

@lassefolkersen
Copy link
Contributor Author

Ok, I think this is ready for review now. Intended behaviour:

  • if csv file has 0 or 1 lines. Full stop, no matter what. That can never be productive (and would skip subsequent checks anyway, because they get read as empty, [BUG] Sarek-dev skips directly to multiqc after interval split #589 ).
  • if csv file has >1 lines, check for presence of fields patient and sample, full stop if not present.
  • remaining checks are unaltered - e.g. if fastq_1 is present, and that file is not there it will check and stop (but that was already part of the code)

Copy link
Contributor

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for adding this. Just something small, I would propose renaming to "numberOfLinesInSampleSheet" or maybe at least num .

workflows/sarek.nf Outdated Show resolved Hide resolved
@FriederikeHanssen
Copy link
Contributor

Do you think it would be possible to add a check that each header field is one of (patient, sample, status, gender, lane, fastq1, fastq2, bam, bai, table, cram, crai, vcf)? (I think those are all but I would need to double check)

@asp8200
Copy link
Contributor

asp8200 commented Jun 17, 2022

Do you think it would be possible to add a check that each header field is one of (patient, sample, status, gender, lane, fastq1, fastq2, bam, bai, table, cram, crai, vcf)? (I think those are all but I would need to double check)

I was thinking along the same lines. Could we do something like

accepted_headers = ['patient', 'sample', 'status', 'gender', 'lane', 'fastq1', 'fastq2', 'bam', 'bai', 'table', 'cram', 'crai', 'vcf']

for each header_field in headers:
    if header_field not in accepted_headers then throw error

@lassefolkersen
Copy link
Contributor Author

Do you think it would be possible to add a check that each header field is one of (patient, sample, status, gender, lane, fastq1, fastq2, bam, bai, table, cram, crai, vcf)? (I think those are all but I would need to double check)

I was thinking along the same lines. Could we do something like

accepted_headers = ['patient', 'sample', 'status', 'gender', 'lane', 'fastq1', 'fastq2', 'bam', 'bai', 'table', 'cram', 'crai', 'vcf']

for each header_field in headers:
    if header_field not in accepted_headers then throw error

No, I don't like that solution. Because if you then do -step variant_calling (a bam/cram as input) or -step annotation(a vcf as input), then all those accepted_headers, except for patient and sample are not required anymore. So it would effectively hinder the ability to enter Sarek at a later stage.

So any solution has to be conditional on -step.

@FriederikeHanssen
Copy link
Contributor

No, I don't like that solution. Because if you then do -step variant_calling (a bam/cram as input) or -step annotation(a vcf as input), then all those accepted_headers, except for patient and sample are not required anymore. So it would effectively hinder the ability to enter Sarek at a later stage.

I am not sure I follow. The way I understand Anders proposal, it would mean that a header could just not be mysillyheaderfield but we would at least check that it is in general part of correct list and then throw an error

Unknown header field. Accepted columns are: 

. I agree it wouldn't check at all that the combination are proper, i.e. --step annotation should only have patient,sample,vcf but would at least prevent typos and so on, but i don't get why it wouldn't advance

@FriederikeHanssen
Copy link
Contributor

No, I don't like that solution. Because if you then do -step variant_calling (a bam/cram as input) or -step annotation(a vcf as input), then all those accepted_headers, except for patient and sample are not required anymore. So it would effectively hinder the ability to enter Sarek at a later stage.

I am not sure I follow. The way I understand Anders proposal, it would mean that a header could just not be mysillyheaderfield but we would at least check that it is in general part of correct list and then throw an error

Unknown header field. Accepted columns are: 

. I agree it wouldn't check at all that the combination are proper, i.e. --step annotation should only have patient,sample,vcf but would at least prevent typos and so on, but i don't get why it wouldn't advance

But either way we can save this for the next PR :)

@FriederikeHanssen FriederikeHanssen merged commit 0473d6f into nf-core:dev Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants