Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rattle correction step giving error #50

Open
dvirdi01 opened this issue Oct 10, 2023 · 9 comments
Open

rattle correction step giving error #50

dvirdi01 opened this issue Oct 10, 2023 · 9 comments

Comments

@dvirdi01
Copy link

dvirdi01 commented Oct 10, 2023

I ran rattle correct on my input files through snakemake. I get an error message saying this:

Error in rule cluster_correction:
jobid: 13
input: data/.../.../samplefile.fastq
output: data/RATTLE_out/samplefile/corrected.fq, data/RATTLE_out/samplefile/uncorrected.fq, data/RATTLE_out/samplefile/consensi.fq
log: log/RATTLE_log/samplefile_correct.out, log/RATTLE_log/samplefile_correct.err (check log file(s) for error details)
shell:
/storage/.../.../bin/RATTLE/rattle correct -i data/.../.../samplefile.fastq -c data/RATTLE_out/samplefile/clusters.out -o data/RATTLE_out/samplefile/corrected.fq data/RATTLE_out/samplefile/uncorrected.fq data/RATTLE_out/samplefile/consensi.fq -t 48 > log/RATTLE_log/samplefile.out 2> log/RATTLE_log/samplefile.err
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Error executing rule cluster_correction on cluster (jobid: 13, external: 2761217, jobscript: /storage/.../.../.../.snakemake/tmp.tz0fhacf/snakejob.cluster_correction.13.sh). For error details see the cluster log and the log files of the involved rule(s).

When I open samplefile.err it says: "Reading fasta file... Done" and when I open samplefile.out it is empty.

I also get this message below:

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2761217.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: valiant1: task 0: Out Of Memory
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2761217.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

I gave it 100GB ram to begin with but I guess it wasn't enough. Is there a way to know how much ram I need to give it before I run the snakemake command?

@eileen-xue
Copy link
Contributor

eileen-xue commented Oct 11, 2023

Hi,

You can have a look at the memory usage figure in our paper.

Otherwise, I need more information like the number of reads or the fastq file size to give you a RAM estimation.

Your 'samplefile.out' file should not be empty, because it is a binary file. You need to look at the file size to check whether it is empty.

Eileen

@dvirdi01
Copy link
Author

dvirdi01 commented Oct 11, 2023

Hi, I checked the output files for some of the processes that did run. IIt created consensi.fq, uncorrected,fq and corrected.fq hbut they are all 0 bytes. I am not sure why this is happening. This was my snakemake command:

rule cluster_correction:
input: "data/.../.../{sample}.fastq"
output:
touch("data/.../{sample}/corrected.fq"),
touch("data/.../{sample}/uncorrected.fq"),
touch("data/.../{sample}/consensi.fq")
params:
clusters = "data/.../{sample}/clusters.out"
log:
out = "log/.../{sample}_correct.out",
err = "log/.../{sample}_correct.err"
threads:
48
resources:
mem = 100
shell:
"""/storage/.../.../.../.../rattle correct
-i {input}
-c {params.clusters}
-o {output}
-t {threads}
> {log.out}
2> {log.err}
"""

To add on: the same happened with my rattle cluster_summary step- it created a tsv file but it was also 0 bytes.

@eileen-xue
Copy link
Contributor

Hi,

This problem seems not from the error correction step but from the clustering step.

Please provide answers to the following questions to help us identify the issues and provide solutions.

  1. Is your clustering step output (clusters.out) file size 0 bytes?
  2. What is your clustering step command? And what is the log for your clustering step?
  3. Do you meet the out-of-memory issue with your clustering step? Normally, clustering uses more memory than error correction.

Eileen

@dvirdi01
Copy link
Author

dvirdi01 commented Oct 12, 2023

  1. none of my clusters.out files are 0 bytes so I think cluster and cluster extraction steps were working
  2. this was my rule for clustering step:

input: "data/.../..../{samle}.fastq.gz"
output:
touch("data/.../{sample}.done")
params:
outdir = "data/..../{sample}"
log:
out = "log/.../{sample}.out",
err = "log/.../{sample}.err"
threads:
48
resources:
mem = 200
shell:
"""mkdir -p {params.outdir};
/storage/.../.../.../.../rattle cluster
--input {input}
--output {params.outdir}
--threads {threads}
--verbose
> {log.out}
2> {log.err}"""

In my log, my sample.out file says "Reads: ...some number..." and my sample.err says: [================================================================================] 67715/67715 (100%)85%)
Iteration 0.3 complete
[================================================================================] 24054/24054 (100%)58%)
Iteration 0.25 complete
[================================================================================] 11360/11360 (100%)12%)
Iteration 0.2 complete
[================================================================================] 7204/7204 (100%)61%)
Iteration 0 complete
Gene clustering done
5507 gene clusters found

  1. I think I did for some of the files. For those I re-ran it by allocating more memory.

@eileen-xue
Copy link
Contributor

Hi,

Your RATTLE error correction step command is incorrect. To specify the outputs, you don't need to list all the output files' names and locations. Only need an output folder location, like -o [out_dir]

Hope this helps.
Eileen

@dvirdi01
Copy link
Author

dvirdi01 commented Oct 13, 2023

  1. Hi, isn't that what I did though? I gave the output file location as params.outdir?

Edit: Oh I think I get what you were saying
I had this earlier for my error correction step in my smk file:

output:
touch("data/.../{sample}/corrected.fq"),
touch("data/.../{sample}/uncorrected.fq"),
touch("data/.../{sample}/consensi.fq")

but I should change it to-

output:
touch("data/.../{sample}")

Is this ^ what you meant? Also In my snakefile I had:

rule all:
    input:
       expand("data/..../{sample}/{filename}.fq",  
       sample = config['samples'], filename = ["corrected", "uncorrected", "consensi"])

Would I need to change the expand command in my snakefile?


  1. Also, how about my cluster_summary.tsv file being empty? Was it due to the same error? I did not run the cluster extraction and cluster summary step from snakemake but I ran it directly from command line for all my files. This is what I had:
./rattle extract_clusters -i /storage/.../.../.../.../.../.../sample.fastq  -c /storage/.../.../.../.../.../sample/clusters.out -o /storage/.../..../.../.../.../sample/clusters --fastq

./rattle cluster_summary -i /storage/.../.../.../.../.../.../sample.fastq -c /storage/.../.../.../.../.../sample/clusters.out > /storage/.../.../.../.../.../sample/cluster_summary.tsv

Why did this command produce an empty tsv file?

@eileen-xue
Copy link
Contributor

eileen-xue commented Oct 16, 2023

  1. Your new output command is correct.
    If you want to use multiple fastq files as input, the format should be -i input_1.fq,input_2.fq,...,input_n.fq. All files must be separated by comma, no space or line break is allowed. Don't use Snakenmake expand for RATTLE input, expand will create new lines.
    Also, I don't understand why using corrected.fq, uncorrected.fq, consensi.fq as input. This will make your input and output the exact same file.

  2. Your command looks correct.
    Possible issues:
    Inputs of the cluster step and cluster_summary step are not the same.
    Your input.fastq file or clusters.out file location is incorrect.

@dvirdi01
Copy link
Author

dvirdi01 commented Oct 16, 2023 via email

@eileen-xue
Copy link
Contributor

extract_clusters and cluster_summary are designed to make cluster step results readable. Only the cluster step is necessary step before the correction step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants