Include vaccine strains #23

trvrb · 2024-04-12T18:50:40Z

Full genomes for Edmonston-related vaccine strains were present in the ingest dataset, but weren't making it to the final genome or N450 results due to getting filtered out from lack of date metadata. This PR surfaces these vaccine strains by:

Including annotations for wild-type Edmonston as well as 5 derived vaccine strains.
Swapping to country year group-by so that samples with just year metadata make it into the final build.
Including strain coloring to provide proper descriptions of these 6 samples.

I've just the entire pipeline locally and everything works as expected.

Results from running this branch are viewable at:

This commit adds strain and date annotations for 5 vaccine strains that all descend from Edmonston isolate collected in 1954. The Parks et al. paper describes these well. I purposely chose not to include location for these as I wanted the gray dot in the Auspice tree to make these look a bit different than wild-type isolates This also includes strain, date and location for Edmonston WT strain.

There's not enough genome data to warrant inclusion of month in the subsampling grouping. Also, by including month the subsampling was dropping a number of older samples that were only annotated by year. I noticed this in wanting to include the 1954 Edmonston related vaccine strains and they were getting filtered out with the previous "country year month" group-by.

Strain name is often not included in GenBank or is not very helpful. But still good to surface as metadata for modal. I particularly wanted this for the 1954 Edmonston-related vaccine strains. People know these by their strain names, certainly not their GenBank accessions.

phylogenetic/defaults/auspice_config.json

This swap to using --metadata-columns in augur export to surface strain, division and location.

kimandrews · 2024-04-12T23:40:30Z

phylogenetic/defaults/config.yaml

+export:
+    metadata_columns: "strain division location"


That sounds good to surface the strain names now. Eventually we should be able to pull more strain names from GenBank, after NCBI Datasets starts pulling the "strain" field, which is where most measles strain names are reported on GenBank (currently we are getting strain names from Genbank's "isolate" field, which NCBI Datasets does pull). NCBI says this is planned for sometime this year. This would also enable us to recover dates for some samples that have empty dates, since dates are part of the strain name.

Excellent! Thanks for the context.

Explicitly add vaccine strains to genome tree and N450 tree, following up on #23 These strains currently end up in the trees due to our subsampling parameters and lack of other sequences from 1954, but this commit explicitly adds them.

trvrb added 3 commits April 12, 2024 11:23

trvrb requested a review from kimandrews April 12, 2024 18:50

trvrb mentioned this pull request Apr 12, 2024

Primary key shouldn't include version number in GenBank accession #24

Closed

joverlee521 reviewed Apr 12, 2024

View reviewed changes

phylogenetic/defaults/auspice_config.json Outdated Show resolved Hide resolved

Export strain, division and location as additional metadata

b51ea60

This swap to using --metadata-columns in augur export to surface strain, division and location.

kimandrews approved these changes Apr 12, 2024

View reviewed changes

trvrb merged commit 6bed278 into main Apr 16, 2024
32 checks passed

trvrb deleted the vaccine-strains branch April 16, 2024 23:12

kimandrews mentioned this pull request Apr 19, 2024

Include WHO reference strains, vaccine strains, NCBI genotypes in trees #26

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include vaccine strains #23

Include vaccine strains #23

trvrb commented Apr 12, 2024

kimandrews Apr 12, 2024 •

edited

Loading

trvrb Apr 16, 2024

Include vaccine strains #23

Include vaccine strains #23

Conversation

trvrb commented Apr 12, 2024

kimandrews Apr 12, 2024 • edited Loading

Choose a reason for hiding this comment

trvrb Apr 16, 2024

Choose a reason for hiding this comment

kimandrews Apr 12, 2024 •

edited

Loading