eccDNA identification from nanopore long reads of rolling-circle amplicon
required arguments for input:
--fastq <STR> input reads in fastq format
--paf <STR> alignments file in PAF format generated by minimap2
--reference <STR> reference genome file in fasta format
required arguments for output:
--info <STR> output file for sequences information
--seq <STR> output file for consensus sequences in fasta format
--var <STR> output file for variants
optional arguments:
--maxOffset <INT> maximum offset of start/end positions between two sub-reads
to be considered as mapping to the same location [default: 20]
--minMapQual <INT> minimum mapping quality of sub-reads [default: 30]
--minDP <INT> minimum depth to call variants [default: 4]
--minAF <FLOAT> minimum alternative allele frequency to call variants [default: 0.75]
--verbose print details of consensus construction
-h, --help show help message
The info file contains all meta information for each eccDNA identified, with 6 columns:
field | description |
---|---|
readname | The name (id) of each read generated by Nanopore |
Nfullpass | Number of full pass for this eccDNA covered by this read |
Nfragment | Number of fragment(s) (genomic location) that form this eccDNA |
refLength | The length of reference genome that this eccDNA was mapped |
seqLength | Actual sequence length of this eccDNA |
fragments | The origin of genomic location(s) for each fragment composing this eccDNA |
When the Nfullpass is 0, it means no eccDNA identifed for this Nanopore read.
The coordinates in fragments are 1-based and inclusive. Multiple fragments are separated by |
.
Example info file:
readname | Nfullpass | Nfragment | refLength | seqLength | fragments |
---|---|---|---|---|---|
3561e493-0b99-4a11-a517-de5681276d82 | 0 | 0 | 0 | 0 | |
8bac2bc4-9e1c-4804-97c7-1dd88184b2b8 | 1 | 1 | 1047 | 1047 | chr5:144628101-144629147(+) |
15ce164b-ef1f-42b2-af1f-9a10c3abf23b | 10 | 1 | 1024 | 1024 | chrX:145145309-145146332(-) |
665d4815-998b-42be-8af7-9a2dc31157b3 | 5 | 2 | 628 | 627 | chr10:91847836-91848275(+)|chr19:58942249-58942436(+) |
02714a9a-3753-47e9-b770-8aa606856ecc | 4 | 2 | 507 | 505 | chr12:53934104-53934326(+)|chr12:86923760-86924043(-) |
The seq file is the reconstructed full length sequence for each eccDNA in fasta format. The id for each sequence is the readname.
The var file contains the variants infered from the Nanopore reads compared to reference genome sequence, with 6 columns:
field | description |
---|---|
1 | chromosome |
2 | position in the reference genome (1-based) |
3 | reference nucleotide(s) |
4 | alternate nucleotides(s) '-' means deletion |
5 | supportive coverage depth |
6 | total coverage depth |
Example var file:
col1 | col2 | col3 | col4 | col5 | col6 |
---|---|---|---|---|---|
chr5 | 125935640 | G | A | 6 | 8 |
chr11 | 93766201 | G | - | 4 | 4 |
chr17 | 17326883 | A | ATCT | 5 | 5 |
chr17 | 45437665 | GG | - | 3 | 4 |
Using --verbose is suggested and can be piped to a log file by | tee out.log
, which will output the mapping structure of each Nanopore read.
So users can check the details of each eccDNA constructed from the rolling circle amplication Nanopore read. Example output:
8f1bb745-b054-4c99-9609-489cf234ea90
#Fragment: 1 Full Pass: 3 Read Length: 6283
24 - chr5:56004938-56005172 (-) - 253
254 - chr5:56004933-56006381 (-) - 1721
1722 - chr5:56004933-56006372 (-) - 3169
3170 - chr5:56004937-56006372 (-) - 4598
Location:
chr5:56004933-56006372 (-)
665d4815-998b-42be-8af7-9a2dc31157b3
#Fragment: 2 Full Pass: 5 Read Length: 3434
1 - chr19:58942250-58942436 (+) - 185
186 - chr10:91847836-91848275 (+) - 626 627 - chr19:58942249-58942436 (+) - 813
814 - chr10:91847836-91848275 (+) - 1235 1236 - chr19:58942249-58942435 (+) - 1429
1430 - chr10:91847836-91848279 (+) - 1867 1868 - chr19:58942252-58942436 (+) - 2053
2054 - chr10:91847836-91848275 (+) - 2481 2482 - chr19:58942249-58942433 (+) - 2671
2672 - chr10:91847838-91848275 (+) - 3098 3099 - chr19:58942249-58942436 (+) - 3278
3279 - chr10:91847836-91847993 (+) - 3434
Location:
chr10:91847836-91848275 (+) chr19:58942249-58942436 (+)
The example Nanopore reads of rolling-circle amplified eccDNA are available at
Wang, Y., Wang, M., Djekidel, M.N. et al. eccDNAs are apoptotic products with high innate immunostimulatory activity. Nature 599, 308–314 (2021).
Wang, Y., Wang, M. & Zhang, Y. Purification, full-length sequencing and genomic origin mapping of eccDNA. Nat Protoc (2022)