Issue assembling plant genome with NECAT #47

LeoVincenzi · 2023-01-11T15:31:10Z

Hi,
I'm working on a plant genome and I'm trying to assemble it with NECAT, but the final assembly I obtain is really inconsistent.
The expected genome size is 1.2 Gbp and I'm working with Oxford Nanopore reads. The starting data for the assembly are reported in the following table:

Number of reads	1,341,399
Number of bases (bp)	33,136,270,559
Average read length (bp)	24,703
Reads N50 (bp)	40,677
Expected fold-coverage	28x

The obtained results are the following:

	NECAT v.0.0.1
Total assembly size (bp)	604,869
Num. Contigs	12
Contigs average length (bp)	50,406
N50 (bp)	153,041
N90 (bp)	17,942
Longest contig (bp)	154,607

The command I run was
/opt/NECAT/Linux-amd64/bin/necat.pl assemble config.txt
and the config file was compiled as it follows:

PROJECT=Plant_genome
ONT_READ_LIST=read_list.txt
GENOME_SIZE=1200000000
THREADS=15
MIN_READ_LENGTH=3000
PREP_OUTPUT_COVERAGE=28
OVLP_FAST_OPTIONS=-n 500 -z 20 -b 2000 -e 0.5 -j 0 -u 1 -a 1000
OVLP_SENSITIVE_OPTIONS=-n 500 -z 10 -e 0.5 -j 0 -u 1 -a 1000
CNS_FAST_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0
CNS_SENSITIVE_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0
TRIM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 1 -a 400
ASM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 0 -a 400
NUM_ITER=1
CNS_OUTPUT_COVERAGE=28
CLEANUP=1
USE_GRID=true
GRID_NODE=8
GRID_OPTIONS=
SMALL_MEMORY=0
FSA_OL_FILTER_OPTIONS=
FSA_ASSEMBLE_OPTIONS=
FSA_CTG_BRIDGE_OPTIONS=
POLISH_CONTIGS=true

I would like to understand why the assembly obtained is so poor and how can I improve it. Maybe the parameters used for this dataset are inadequate?

The text was updated successfully, but these errors were encountered:

lemene · 2023-01-24T01:47:27Z

Hi,
28X is slightly less than the coverage of Nanopore reads expected by NECAT (>=40X). This affects the integrity of the assembly. Using the following parameters may improve the assembly.
FSA_OL_FILTER_OPTIONS=--min_coverage 2
2 can be replaced by 1 or 3.

The folders 4-fsa, 5-align_contigs and 6-bridge_contigs need to be renamed or deleted before running the command necat.pl bridge cfgfile. This will skip the error correction step and reassemble the corrected reads.

LeoVincenzi · 2023-03-02T09:17:10Z

Hi,
thanks to your suggestion, we end up with an assembly of the desired size and with a high N50 value. I've also noticed that the 'bridge' improve the contiguity doubling the N50 that we could get from the 'assemble' step.
Anyway, I would like to ask you how the parameter FSA_OL_FILTER_OPTIONS affect the assembly: I suppose it is implied in the overlapping regions, but if we start from a high coverage (40x), why should we consider a minimum coverage with such a low value (1,2,3,..)?

lemene · 2023-03-03T12:31:37Z

Hi @LeoVincenzi
The assembler calculates the coverage of each read. If the coverage is less than the threshold min_coverage, the read and the related overlaps are filtered out. The assembler can automatically calculate a value for it, but sometimes it is not appropriate. According to our experience, min_coverage = 3 is not a bad choice.

lemene · 2023-03-03T13:49:43Z

Some raw reads are broken into multiple corrected reads in the error correction step. The unbroken raw reads are used to bridge the contigs, so the assembler can output the longer N50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue assembling plant genome with NECAT #47

Issue assembling plant genome with NECAT #47

LeoVincenzi commented Jan 11, 2023

lemene commented Jan 24, 2023

LeoVincenzi commented Mar 2, 2023

lemene commented Mar 3, 2023

lemene commented Mar 3, 2023

Issue assembling plant genome with NECAT #47

Issue assembling plant genome with NECAT #47

Comments

LeoVincenzi commented Jan 11, 2023

lemene commented Jan 24, 2023

LeoVincenzi commented Mar 2, 2023

lemene commented Mar 3, 2023

lemene commented Mar 3, 2023