Skip to content

Commit

Permalink
Update base modification test files to use the final MM/ML tags (PR #776
Browse files Browse the repository at this point in the history
)

Also add HTSlib's MM-explicit.sam, which exercises the skipped bases
modification state indicator character, and add an example MN:i field
to MM-multi.sam, to keep it in sync with HTSlib's copy.
  • Loading branch information
jmarshall committed Jul 17, 2024
1 parent be74ef7 commit f907ead
Show file tree
Hide file tree
Showing 8 changed files with 118 additions and 13 deletions.
2 changes: 1 addition & 1 deletion test/SAMtags/MM-chebi.sam
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
@CO Separate m, h and N modifications
* 0 * 0 0 * * 0 0 AGCTCTCCAGAGTCGNACGCCATYCGCGCGCCACCA * Mm:Z:C+m,2,2,1,4,1;C+76792,6,7;N+n,15; Ml:B:C,102,128,153,179,204,161,33,212
* 0 * 0 0 * * 0 0 AGCTCTCCAGAGTCGNACGCCATYCGCGCGCCACCA * MM:Z:C+m,2,2,1,4,1;C+76792,6,7;N+n,15; ML:B:C,102,128,153,179,204,161,33,212
2 changes: 1 addition & 1 deletion test/SAMtags/MM-double.sam
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
@CO Modifications called on both strands of the same record,
@CO including potentially at the same location simultaneously.
* 0 * 0 0 * * 0 0 AGGATCTCTAGCGGATCGGCGGGGGATATGCCATAT * Mm:Z:C+m,1,3,0;G-m,0,2,0,4;G+o,4; Ml:B:C,128,153,179,115,141,166,192,102
* 0 * 0 0 * * 0 0 AGGATCTCTAGCGGATCGGCGGGGGATATGCCATAT * MM:Z:C+m,1,3,0;G-m,0,2,0,4;G+o,4; ML:B:C,128,153,179,115,141,166,192,102
27 changes: 27 additions & 0 deletions test/SAMtags/MM-explicit.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
@CO Testing explicit vs implicit base modifications.
@CO This covers the case where a lack of a signal could be either
@CO implicitly assumed to be no-mod (default) or assumed to be
@CO unchecked and require an explicit statement to indicate it was
@CO looked at and no base modification was observed.
@CO
@CO ATCATCATTCCTACCGCTATAGCCT r1; implicit
@CO - - .. -. - --
@CO Mm M
@CO - - .. -. - --
@CO hH h
@CO
@CO ATCATCATTCCTACCGCTATAGCCT r2; explicit to a small region
@CO - - ?? ?? ? --
@CO Mm mM m
@CO - - ?? ?? ? --
@CO hH hh h
@CO
@CO ATCATCATTCCTACCGCTATAGCCT r3; mixture
@CO - - . -. - --
@CO M M
@CO - - ?? ?? ? --
@CO hH hh h --
@CO
r1 0 * 0 0 * * 0 0 ATCATCATTCCTACCGCTATAGCCT * MM:Z:C+mh,2,0,1; ML:B:C,200,10,50,170,160,20
r2 0 * 0 0 * * 0 0 ATCATCATTCCTACCGCTATAGCCT * MM:Z:C+mh?,2,0,0,0,0; ML:B:C,200,10,50,170,10,5,160,20,10,5
r3 0 * 0 0 * * 0 0 ATCATCATTCCTACCGCTATAGCCT * MM:Z:C+m.,2,2;C+h?,2,0,0,0,0; ML:B:C,200,160,10,170,5,20,5
77 changes: 77 additions & 0 deletions test/SAMtags/MM-explicit.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
A T
T A
C G
A T
T A
C G
A T
T A
T A
Cm78h4 G
Cm19h66 G
T A
A T
C G
Cm62h8 G
G C
C G
T A
A T
T A
A T
G C
C G
C G
T A

A T
T A
C G
A T
T A
C G
A T
T A
T A
Cm78h4 G
Cm19h66 G
T A
A T
Cm4h2 G
Cm62h8 G
G C
Cm4h2 G
T A
A T
T A
A T
G C
C G
C G
T A

A T
T A
C G
A T
T A
C G
A T
T A
T A
Cm78h4 G
Ch66 G
T A
A T
Ch2 G
Cm62h8 G
G C
Ch2 G
T A
A T
T A
A T
G C
C G
C G
T A
4 changes: 2 additions & 2 deletions test/SAMtags/MM-multi.sam
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@
@CO r2 has them combined together, for example as produced by
@CO a joint basecaller which assigns probabilities to all
@CO trained events simultaneously.
r1 0 * 0 0 * * 0 0 AGCTCTCCAGAGTCGNACGCCATYCGCGCGCCACCA * Mm:Z:C+m,2,2,1,4,1;C+h,6,7;N+n,15,2; Ml:B:C,128,153,179,204,230,159,6,215,240
r2 0 * 0 0 * * 0 0 AGCTCTCCAGAGTCGNACGCCATYCGCGCGCCACCA * Mm:Z:C+mh,2,2,0,0,4,1;N+n,15; Ml:B:C,77,159,103,133,128,108,154,82,179,57,204,31,240
r1 0 * 0 0 * * 0 0 AGCTCTCCAGAGTCGNACGCCATYCGCGCGCCACCA * MM:Z:C+m,2,2,1,4,1;C+h,6,7;N+n,15,2; ML:B:C,128,153,179,204,230,159,6,215,240 MN:i:36
r2 0 * 0 0 * * 0 0 AGCTCTCCAGAGTCGNACGCCATYCGCGCGCCACCA * MM:Z:C+mh,2,2,0,0,4,1;N+n,15; ML:B:C,77,159,103,133,128,108,154,82,179,57,204,31,240
8 changes: 4 additions & 4 deletions test/SAMtags/MM-orient.sam
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
@CO Testing mods on top and bottom strand, but also in
@CO original vs reverse-complemented orientation
top-fwd 0 * 0 0 * * 0 0 AGGATCTCTAGCGGATCGGCGGGGGATATGCCATAT * Mm:Z:C+m,1,3,0; Ml:B:C,128,153,179
top-rev 16 * 0 0 * * 0 0 ATATGGCATATCCCCCGCCGATCCGCTAGAGATCCT * Mm:Z:C+m,1,3,0; Ml:B:C,128,153,179
bot-fwd 0 * 0 0 * * 0 0 AGGATCTCTAGCGGATCGGCGGGGGATATGCCATAT * Mm:Z:G-m,0,0,4,3; Ml:B:C,115,141,166,192
bot-rev 16 * 0 0 * * 0 0 ATATGGCATATCCCCCGCCGATCCGCTAGAGATCCT * Mm:Z:G-m,0,0,4,3; Ml:B:C,115,141,166,192
top-fwd 0 * 0 0 * * 0 0 AGGATCTCTAGCGGATCGGCGGGGGATATGCCATAT * MM:Z:C+m,1,3,0; ML:B:C,128,153,179
top-rev 16 * 0 0 * * 0 0 ATATGGCATATCCCCCGCCGATCCGCTAGAGATCCT * MM:Z:C+m,1,3,0; ML:B:C,128,153,179
bot-fwd 0 * 0 0 * * 0 0 AGGATCTCTAGCGGATCGGCGGGGGATATGCCATAT * MM:Z:G-m,0,0,4,3; ML:B:C,115,141,166,192
bot-rev 16 * 0 0 * * 0 0 ATATGGCATATCCCCCGCCGATCCGCTAGAGATCCT * MM:Z:G-m,0,0,4,3; ML:B:C,115,141,166,192
8 changes: 4 additions & 4 deletions test/SAMtags/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
Mm and Ml auxiliary tags
MM and ML auxiliary tags
========================

The purpose of these test files is to test parsing of the Mm and Ml
tags. These succinct Mm and Ml tags are present in the .sam files,
The purpose of these test files is to test parsing of the MM and ML
tags. These succinct MM and ML tags are present in the .sam files,
with a more human readable expanded form in the .txt files.
Developers should check whether their implementation is able to
convert between the two forms.

The .sam files are SAM format, but the only fields used for this
test are the reverse-complementation flag (FLAG bit 0x10), the
sequence, and the Mm and Ml tags.
sequence, and the MM and ML tags.

The .txt files uses one line per base, with a blank line separating
sequences. Each line consists of two tab-separated fields
Expand Down
3 changes: 2 additions & 1 deletion test/SAMtags/parse_mm.pl
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ sub rc {

print "\n" if $nseq++ > 0;
foreach (@mods) {
my ($base, $strand, $types, $pos) = $_ =~ m/([A-Z])([-+])([^,]+),(.*)/;
my ($base, $strand, $types, $skipped_prob, $pos) =
$_ =~ m/([A-Z])([-+])([a-z]+|[0-9]+)([.?]?),(.*)/;

my $i = 0; # I^{th} bosition in sequence
foreach my $delta (split(",", $pos)) {
Expand Down

0 comments on commit f907ead

Please sign in to comment.