Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mirtop gff produces different md5sum for the same input files #83

Open
atrigila opened this issue Sep 12, 2024 · 10 comments
Open

mirtop gff produces different md5sum for the same input files #83

atrigila opened this issue Sep 12, 2024 · 10 comments

Comments

@atrigila
Copy link

Expected behavior and actual behavior.

When running mirtop gff with the same input files, the mirtop.gff file that is produced may have different md5sum. This is because the order of the comma-separated elements in the "Variant" section may change.

This has been observed in the nf-core module of mirtop/gff, which required that tests had to check for the presence of a string instead of md5sums. The same was observed with mirtop/counts.

Steps to reproduce the problem.

You can reproduce this by running mirtop gff several times and using the following inputs. You can also do it with nf-test.

  1. In the nf-core/modules, run the command nf-test test [path to mirtop/gff nf-test] --profile docker --debug --verbose. Check the output file mirtop.gff in the corresponding work directory.
  2. Select one or several lines from the mirtop.gff output.
  3. Include those lines in the nf-test snapshot section: such as file(process.out.mirtop_gff[0][1]).readLines().findAll { it.contains("YOUR-MIRTOP-GFF-LINE-HERE") },
  4. Run the command nf-core modules test mirtop/gff or nf-test several times to see that the tests do not match the snapshot as the md5sums change and is not stable.
@lpantano
Copy link
Contributor

thank you, that is not expected indeed. I am not sure now where to look but thank you for the input files.

This was referenced Sep 13, 2024
@lpantano
Copy link
Contributor

@atrigila the new version is up: bioconda/bioconda-recipes#50710, but they removed dependencies because there was a conflict that I am try to figure out. When preparing the new docker containers, dependencies needs to be added, maybe with the new seqera containers is easy?

@atrigila
Copy link
Author

I will test it soon and let you know what happens.

@lpantano
Copy link
Contributor

This should be good:

  • python=3.11
  • bioconda::pysam
  • bioconda::pybedtools
  • conda-forge::pandas
  • conda-forge::biopython=1.83

@lpantano
Copy link
Contributor

forgot: - bioconda::samtools=1.21

@atrigila
Copy link
Author

In the mirtop/gff example, there are 2 gff outputs produced:

  • one with the name of the sample e.g. sim_isomir_sort.gff
  • one named mirtop.gff
    In the smrnaseq pipeline we normally used the mirtop.gff for all the following steps. Updating to the latest version I still see differences in different runs in this file:

image

@atrigila
Copy link
Author

I'll check the dependencies

@atrigila
Copy link
Author

channels:
  - conda-forge
  - bioconda
dependencies:
  - "bioconda::mirtop=0.4.27"
  - "bioconda::samtools=1.21"
  - "conda-forge::python=3.11"
  - "conda-forge::biopython=1.83"
  - "bioconda::pysam=0.22.1"
  - "bioconda::pybedtools=0.10.0"
  - "conda-forge::pandas=2.2.2"

@lpantano
Copy link
Contributor

mm, ok, I couldn't get differences. Maybe I need to run more than 3 times...does it happen to you all the time?

@atrigila
Copy link
Author

Now with mirtop=0.4.28, it works! Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants