Skip to content

Commit

Permalink
[ancestral] Add VCF tests
Browse files Browse the repository at this point in the history
The added test here fails under TreeTime 0.11.1 but has been fixed as
part of <neherlab/treetime#263>.

Description of the bug (as it relates to the test added in `vcf.t`):

The SNPs at nt 33 are encoded in the VCF as:
1	    33	.   A	C,G	.	    .	    .	    GT	    1	        2	        0
where ALT 1 ("C") is on Sample_A and ALT 2 ("G") is on Sample_B.
The ALT 2 is not being parsed by `read_vcf`, which results in
a changed mutation profile at pos 33:
.       **FASTA input**               **VCF input**
.         |---G33C-- sample_A           |---A33C-- sample_A
.  --A33G-|                      -------|
.         |--------- sample_B           |--------- sample_B
.
Because of this bug, the following test fails.

The `read_vcf` function is used in augur commands ancestral, refine,
sequence-traits, translate and tree.
  • Loading branch information
jameshadfield committed Jan 22, 2024
1 parent 9cd915a commit db16e83
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 0 deletions.
26 changes: 26 additions & 0 deletions tests/functional/ancestral/cram/vcf.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Setup

$ source "$TESTDIR"/_setup.sh

$ export DATA="$TESTDIR/../data/simple-genome"

This command mirrors the first test in general.t, however
with VCF input instead of a FASTA MSA.
The output will not have the full sequence attached to every node,
but it will have the reference sequence attached.

$ ${AUGUR} ancestral \
> --tree $DATA/tree.nwk \
> --alignment $DATA/snps.vcf \
> --vcf-reference $DATA/reference.fasta \
> --output-node-data "nt_muts.vcf-input.ref-seq.json" \
> --output-vcf "nt_muts.vcf-input.ref-seq.vcf" \
> --inference marginal > /dev/null


$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
> "$DATA/nt_muts.ref-seq.json" \
> "nt_muts.vcf-input.ref-seq.json" \
> --exclude-regex-paths "root\['nodes'\]\['.+'\]\['sequence'\]"
{}

15 changes: 15 additions & 0 deletions tests/functional/ancestral/data/simple-genome/snps.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
##fileformat=VCFv4.3
##contig=<ID=1,length=50>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample_A sample_B sample_C
1 5 . A C . . . GT 1 1 1
1 7 . A G . . . GT 1 1 0
1 14 . C T . . . GT 1 1 0
1 18 . C T . . . GT 0 0 1
1 28 . A N . . . GT 0 0 1
1 29 . A N . . . GT 0 0 1
1 30 . A N . . . GT 0 0 1
1 33 . A C,G . . . GT 1 2 0
1 39 . C T . . . GT 1 0 0
1 42 . G A . . . GT 0 1 0
1 43 . A T . . . GT 1 1 0

0 comments on commit db16e83

Please sign in to comment.