Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault #16

Closed
colindaven opened this issue Feb 6, 2020 · 6 comments
Closed

segfault #16

colindaven opened this issue Feb 6, 2020 · 6 comments

Comments

@colindaven
Copy link

I am getting an error running
srun -c 56 necat.pl assemble pig2.txt

This is a ca. 18X pure ONT (2019-2020) pig dataset.

Could it be a memory issue ? How can I create a minimal testset of data to find the segfault ?

The command running is something like:
25741 rcug 20 0 71.411g 0.067t 3092 S 5014 13.6 707:23.15 /mnt/ngsnfs/tools/necat/NECAT/Linux-amd64/bin/oc2asmpm -n 100 -z 10 -b 2000 -e 0.5 -j 1 -u+


[Thu Feb  6 12:26:50 2020] INFO: mapping 65850 --- 65900 (66269) (asm_pm/asm_pm_common.c, 550)
[Thu Feb  6 12:26:50 2020] INFO: mapping 65900 --- 65950 (66269) (asm_pm/asm_pm_common.c, 550)
[Thu Feb  6 12:26:50 2020] INFO: mapping 65950 --- 66000 (66269) (asm_pm/asm_pm_common.c, 550)
[Thu Feb  6 12:26:50 2020] INFO: mapping 66000 --- 66050 (66269) (asm_pm/asm_pm_common.c, 550)
[Thu Feb  6 12:26:50 2020] INFO: mapping 66050 --- 66100 (66269) (asm_pm/asm_pm_common.c, 550)
[Thu Feb  6 12:26:51 2020] INFO: mapping 66100 --- 66150 (66269) (asm_pm/asm_pm_common.c, 550)
[Thu Feb  6 12:26:51 2020] INFO: mapping 66150 --- 66200 (66269) (asm_pm/asm_pm_common.c, 550)
[Thu Feb  6 12:26:51 2020] INFO: mapping 66200 --- 66250 (66269) (asm_pm/asm_pm_common.c, 550)
[Thu Feb  6 12:26:51 2020] INFO: mapping 66250 --- 66269 (66269) (asm_pm/asm_pm_common.c, 550)
[Thu Feb  6 12:27:03 2020] INFO: 'pairwise mapping v0 vs v8' takes 323.26 secs. (asm_pm/working2/rcug/assembly/2020_pig/necat/pig/scripts/tr_al_vol_0.sh: line 16: 37826 Segmentation fault      (core dumped) /mnt/ngsnfs/tools/necat/NECAT/Linux-amd64/bin/oc2asmpm -n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 0 -a 400 -u 1 -t 50 /working2/rcug/assembly/2020_pig/necat/pig/2-trim_bases/pac_in 0 /working2/rcug/assembly/2020_pig/necat/pig/2-trim_bases/pac_in/pm_result_0
2020-02-06 12:27:11 [Warning] Failed to run script, 139, /working2/rcug/assembly/2020_pig/necat/pig/scripts/tr_al_vol_0.sh
2020-02-06 12:27:11 [Error] Reached to maximum number of script errors
srun: error: hpc-rc05: task 0: Exited with exit code 1
@colindaven
Copy link
Author

colindaven commented Feb 7, 2020

Memory looks fine, checked again, reran allowing 480gb of RAM and less than 100 GB was used.

Different segfault this time I believe.


[Fri Feb  7 02:28:35 2020] INFO: loading /working2/rcug/assembly/2020_pig/necat/pig/4-fsa/pac_contigs/p3187 (ctg_cns/cns_one_ctg.c, 25)
[Fri Feb  7 02:28:35 2020] INFO: correct 0 --- 106751, number of m4: 34 (ctg_cns/cns_one_ctg.c, 102)
[Fri Feb  7 02:28:35 2020] INFO: extended m4: 40 (ctg_cns/cns_ctg_subseq.c, 196)
[Fri Feb  7 02:28:35 2020] INFO: extended m4: 26 (ctg_cns/cns_ctg_subseq.c, 196)
[Fri Feb  7 02:28:35 2020] INFO: extended m4: 27 (ctg_cns/cns_ctg_subseq.c, 196)
[Fri Feb  7 02:28:35 2020] INFO: 'correct contig 3125, length = 110819' takes 0.85 secs. (ctg_cns/main.c, 81)
[Fri Feb  7 02:28:35 2020] INFO: 'correct contig 3188, length = 286382' BEGINS (ctg_cns/main.c, 66)
[Fri Feb  7 02:28:35 2020] INFO: loading /working2/rcug/assembly/2020_pig/necat/pig/4-fsa/pac_contigs/p3188 (ctg_cns/cns_one_ctg.c, 25)
[Fri Feb  7 02:28:35 2020] INFO: correct 0 --- 286382, number of m4: 64 (ctg_cns/cns_one_ctg.c, 102)
[Fri Feb  7 02:28:35 2020] INFO: extended m4: 15 (ctg_cns/cns_ctg_subseq.c, 196)
[Fri Feb  7 02:28:35 2020] INFO: extended m4: 19 (ctg_cns/cns_ctg_subseq.c, 196)
[Fri Feb  7 02:28:35 2020] INFO: extended m4: 67 (ctg_cns/cns_ctg_subseq.c, 196)
[Fri Feb  7 02:28:35 2020] INFO: 'correct contig 3149, length = 83016' takes 0.56 secs. (ctg_cns/main.c, 81)
[Fri Feb  7 02:28:35 2020] INFO: 'correct contig 3189, length = 70109' BEGINS (ctg_cns/main.c, 66)
[Fri Feb  7 02:28:35 2020] INFO: loading /working2/rcug/assembly/2020_pig/necat/pig/4-fsa/pac_contigs/p3189 (ctg_cns/cns_one_ctg.c, 25)
[Fri Feb  7 02:28:35 2020] INFO: correct 0 --- 70109, number of m4: 0 (ctg_cns/cns_one_ctg.c, 102)
[Fri Feb  7 02:28:35 2020] INFO: extended m4: 0 (ctg_cns/cns_ctg_subseq.c, 196)
[Fri Feb  7 02:28:35 2020] INFO: 'correct contig 3189, length = 70109' takes 0.01 secs. (ctg_cns/main.c, 81)
[Fri Feb  7 02:28:35 2020] INFO: 'correct contig 3184, length = 62965' takes 0.20 secs. (ctg_cns/main.c, 81)
[Fri Feb  7 02:28:35 2020] INFO: 'correct contig 3160, length = 89449' takes 0.46 secs. (ctg_cns/main.c, 81)
[Fri Feb  7 02:28:35 2020] INFO: 'correct contig 2712, length = 858742' takes 9.55 secs. (ctg_cns/main.c, 81)
[Fri Feb  7 02:28:35 2020] INFO: 'correct contig 3193, length = 18736' BEGINS (ctg_cns/main.c, 66)
[Fri Feb  7 02:28:35 2020] INFO: loading /working2/rcug/assembly/2020_pig/necat/pig/4-fsa/pac_contigs/p3193 (ctg_cns/cns_one_ctg.c, 25)
[Fri Feb  7 02:28:35 2020] INFO: 'correct contig 3191, length = 136043' BEGINS (ctg_cns/main.c, 66)
[Fri Feb  7 02:28:35 2020] INFO: loading /working2/rcug/assembly/2020_pig/necat/pig/4-fsa/pac_contigs/p3191 (ctg_cns/cns_one_ctg.c, 25)
[Fri Feb  7 02:28:35 2020] INFO: correct 0 --- 18736, number of m4: 0 (ctg_cns/cns_one_ctg.c, 102)
[Fri Feb  7 02:28:35 2020] INFO: /working2/rcug/assembly/2020_pig/necat/pig/scripts/plctg0_cns.sh: line 40: 46409 Segmentation fault      (core dumped) /mnt/ngsnfs/tools/necat/NECAT/Linux-amd64/bin/ctgcns /working2/rcug/assembly/2020_pig/necat/pig/4-fsa/pac_reads /working2/rcug/assembly/2020_pig/necat/pig/4-fsa/pac_contigs 50 /working2/rcug/assembly/2020_pig/necat/pig/4-fsa/polished_contigs.fasta
2020-02-07 02:28:37 [Warning] Failed to run script, 139, /working2/rcug/assembly/2020_pig/necat/pig/scripts/plctg0_cns.sh
2020-02-07 02:28:37 [Error] Reached to maximum number of script errors
srun: error: hpc-rc05: task 0: Exited with exit code 1
(base) rcug@hpc-rc06:/working2/rcug/assembly/2020_pig/nec

@ppapasaikas
Copy link

+1 Same issue during trimming running on a 60X pure ONT mammalian (~2gb) genome with 96 cores.

scripts/tr_al_vol_1.sh: line 16: 251263 Segmentation fault

Non default parameters on the config file:

GENOME_SIZE=2000000000
THREADS=96
MIN_READ_LENGTH=4000
PREP_OUTPUT_COVERAGE=45
CNS_OUTPUT_COVERAGE=40

As a side note when running the bridge or assemble commands the pipeline repeats the previously completed steps (not sure if this is intended behavior). As a workaround I manually modified the necat.pl file commenting out the completed steps:

sub cmdBridge($) {
    my ($fname) = @_;

    %cfg = loadNecatConfig($fname);
    %env = loadNecatEnv(\%cfg);
    initializeNecatProject(\%cfg);

    my $prjDir = %env{"WorkPath"} . "/" .%cfg{"PROJECT"};

    #runConsensus(\%env, \%cfg);

    #runTrimAlignReads(\%env, \%cfg);

    runAssemble(\%env, \%cfg);

    runAlignContigs(\%env, \%cfg);
    runBridgeContigs(\%env, \%cfg);
    
    if (%cfg{"POLISH_CONTIGS"} == 1 or %cfg{"POLISH_CONTIGS"} eq "true") {
        runPolishContigs(\%env, \%cfg, "plctg1", "$prjDir/6-bridge_contigs/bridged_contigs.fasta", 
                     "$prjDir/4-fsa/contig_tiles", "$prjDir/trimReads.fasta.gz", "6-bridge_contigs");
    
        statReadN50(\%env, \%cfg, "6-bridge_contigs/polished_contigs.fasta", "polished contigs");
    } else {
    
        statReadN50(\%env, \%cfg, "6-bridge_contigs/bridged_contigs.fasta", "bridged contigs");
    }
}

@evanrrees
Copy link

Also encountering this issue during polishing on a 30X * 2.3Gbp plant genome. Running on a single node with 86 threads / 512GB RAM.
I've rerun the job several times and it fails at different points.
Command is necat.pl bridge Mo44_necat.config.

STDOUT
most recent attempt

2020-02-26 09:08:05 [Info] Found Slurm, which is /usr/bin/sinfo
2020-02-26 09:08:05 [Info] Skip correcting rawreads for outputs are newer.
2020-02-26 09:08:05 [Info] Skip trimming reads for outputs are newer.
2020-02-26 09:08:05 [Info] Skip assembling for outputs are newer.
2020-02-26 09:08:05 [Info] Skip aligning rawreads and contigs to contigs for outputs are newer.
2020-02-26 09:08:05 [Info] Skip bridging contigs for outputs are newer.
2020-02-26 09:08:05 [Info] Start polishing contigs.
2020-02-26 09:08:05 [Info] Skip making vol for polishing contigs for outputs are newer.
2020-02-26 09:08:05 [Info] Skip aligning vol for polishing contigs for outputs are newer.
2020-02-26 09:08:05 [Info] Skip catenating vol for polishing contigs for outputs are newer.
2020-02-26 09:08:05 [Info] Start polishing contigs.
2020-02-26 09:08:05 [Info] Run script: /local/workdir/err87/Mo44_necat/Mo44_necat/scripts/plctg1_cns.sh 2>&1 |tee /local/workdir/err87/Mo44_necat/Mo44_necat/scripts/plctg1_cns.sh.log
2020-02-26 09:16:56 [Warning] Failed to run script, 139, /local/workdir/err87/Mo44_necat/Mo44_necat/scripts/plctg1_cns.sh
2020-02-26 09:16:56 [Error] Reached to maximum number of script errors

STDERR
previous 3 attempts

[Tue Feb 25 17:16:05 2020] INFO: extended m4: 5 (ctg_cns/cns_ctg_subseq.c, 196)
[Tue Feb 25 17:16:05 2020] INFO: 'correct contig 1083, length = 235950' takes 5.33 secs. (ctg_cns/main.c, 81)
[Tue Feb 25 17:16:05 2020] INFO: 'correct contig 1434, length = 127780' BEGINS (ctg_cns/main.c, 66)
[Tue Feb 25 17:16:05 2020] INFO: loading /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_contigs/p1434 (ctg_cns/cns_one_ctg.c, 25)
[Tue Feb 25 17:16:05 2020] INFO: 'correct contig 1396, length = 132393' takes 0.38 secs. (ctg_cns/main.c, 81)
[Tue Feb 25 17:16:05 2020] INFO: 'correct contig 1435, length = 127689' BEGINS (ctg_cns/main.c, 66)
[Tue Feb 25 17:16:05 2020] INFO: loading /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_contigs/p1435 (ctg_cns/cns_one_ctg.c, 25)
[Tue Feb 25 17:16:05 2020] INFO: 'correct contig 1344, length = 138892' takes 0.85 secs. (ctg_cns/main.c, 81)
[Tue Feb 25 17:16:05 2020] INFO: 'correct contig 1436, length = 127675' BEGINS (ctg_cns/main.c, 66)
[Tue Feb 25 17:16:05 2020] INFO: loading /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_contigs/p1436 (ctg_cns/cns_one_ctg.c, 25)
[Tue Feb 25 17:16:05 2020] INFO: 'correct contig 1390, length = 133221' ta/local/workdir/err87/Mo44_necat/Mo44_necat/scripts/plctg1_cns.sh: line 40: 162471 Segmentation fault      (core dumped) /local/workdir/err87/NECAT/Linux-amd64/bin/ctgcns /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_reads /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_contigs 86 /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/polished_contigs.fasta
--
[Wed Feb 26 08:59:47 2020] INFO: loading /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_contigs/p740 (ctg_cns/cns_one_ctg.c, 25)
[Wed Feb 26 08:59:47 2020] INFO: 'correct contig 655, length = 765738' takes 22.83 secs. (ctg_cns/main.c, 81)
[Wed Feb 26 08:59:47 2020] INFO: 'correct contig 743, length = 557043' BEGINS (ctg_cns/main.c, 66)
[Wed Feb 26 08:59:47 2020] INFO: loading /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_contigs/p743 (ctg_cns/cns_one_ctg.c, 25)
[Wed Feb 26 08:59:47 2020] INFO: correct 0 --- 561341, number of m4: 241 (ctg_cns/cns_one_ctg.c, 102)
[Wed Feb 26 08:59:47 2020] INFO: correct 0 --- 557043, number of m4: 253 (ctg_cns/cns_one_ctg.c, 102)
[Wed Feb 26 08:59:47 2020] INFO: correct 0 --- 561206, number of m4: 255 (ctg_cns/cns_one_ctg.c, 102)
[Wed Feb 26 08:59:47 2020] INFO: correct 0 --- 563856, number of m4: 273 (ctg_cns/cns_one_ctg.c, 102)
[Wed Feb 26 08:59:47 2020] INFO: 'correct contig 703, length = 633207' takes 8.95 secs. (ctg_cns/main.c, 81)
[Wed Feb 26 08:59:47 2020] INFO: 'correct contig 744, length = 556185' BEGINS (ctg_cns/main.c, 66)
[Wed Feb 26 08:59:47 2020] INFO: loading /local/workdir/e/local/workdir/err87/Mo44_necat/Mo44_necat/scripts/plctg1_cns.sh: line 40: 59179 Segmentation fault      (core dumped) /local/workdir/err87/NECAT/Linux-amd64/bin/ctgcns /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_reads /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_contigs 86 /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/polished_contigs.fasta
--
[Wed Feb 26 09:16:00 2020] INFO: 'correct contig 93, length = 5132488' takes 156.92 secs. (ctg_cns/main.c, 81)
[Wed Feb 26 09:16:00 2020] INFO: 'correct contig 160, length = 4067054' BEGINS (ctg_cns/main.c, 66)
[Wed Feb 26 09:16:00 2020] INFO: loading /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_contigs/p160 (ctg_cns/cns_one_ctg.c, 25)
[Wed Feb 26 09:16:00 2020] INFO: correct 0 --- 1000000, number of m4: 533 (ctg_cns/cns_one_ctg.c, 102)
[Wed Feb 26 09:16:00 2020] INFO: correct 2000000 --- 3000000, number of m4: 502 (ctg_cns/cns_one_ctg.c, 102)
[Wed Feb 26 09:16:00 2020] INFO: extended m4: 433 (ctg_cns/cns_ctg_subseq.c, 196)
[Wed Feb 26 09:16:01 2020] INFO: correct 3000000 --- 4000000, number of m4: 768 (ctg_cns/cns_one_ctg.c, 102)
[Wed Feb 26 09:16:01 2020] INFO: extended m4: 522 (ctg_cns/cns_ctg_subseq.c, 196)
[Wed Feb 26 09:16:01 2020] INFO: correct 1000000 --- 2000000, number of m4: 603 (ctg_cns/cns_one_ctg.c, 102)
[Wed Feb 26 09:16:01 2020] INFO: 'correct contig 10, length = 11416865' takes 315.04 secs. (ctg_cns/main.c, 81)
[Wed Feb 26 09:16:01 2020] IN/local/workdir/err87/Mo44_necat/Mo44_necat/scripts/plctg1_cns.sh: line 40: 109218 Segmentation fault      (core dumped) /local/workdir/err87/NECAT/Linux-amd64/bin/ctgcns /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_reads /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/pac_contigs 86 /local/workdir/err87/Mo44_necat/Mo44_necat/6-bridge_contigs/polished_contigs.fasta

@ckeeling
Copy link

+1 Same issue, segmentation fault (core dumped) during the assembly step with v0.0.1_update20200803, single node of 48 cpus and 187 Gb. Did you ever figure out a solution @colindaven @ppapasaikas or @evanrrees ? Do you have any suggestions @xiaochuanle ?

@colindaven
Copy link
Author

It seems the authors don't care at all. I'd use other software.

@evanrrees
Copy link

@ckeeling Unfortunately never found a solution and moved on to other programs. Flye is worth looking into as an alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants