Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include vaccine strains #23

Merged
merged 4 commits into from
Apr 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions ingest/defaults/annotations.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,23 @@
# If there are multiple annotations for the same id and field, then the last value is used
# Lines starting with '#' are treated as comments
# Any '#' after the field value are treated as comments.
#
# Vaccine strain information from Parks et al. Comparison of predicted amino acid
# sequences of measles virus strains in the Edmonston vaccine lineage
# https://doi.org/10.1128/jvi.75.2.910-920.2001
AF266288.2 strain Measles strain Edmonston WT
AF266288.2 date 1954
AF266288.2 region North America
AF266288.2 country USA
AF266288.2 division Massachusetts
AF266288.2 location Boston
AF266287.1 strain Measles vaccine strain Moraten
AF266287.1 date 1954
AF266290.1 strain Measles vaccine strain Zagreb
AF266290.1 date 1954
AF266289.1 strain Measles vaccine strain Rubeovax
AF266289.1 date 1954
AF266291.1 strain Measles vaccine strain Schwarz
AF266291.1 date 1954
AF266286.1 strain Measles vaccine strain AIK-C
AF266286.1 date 1954
8 changes: 5 additions & 3 deletions phylogenetic/defaults/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ files:
colors: "defaults/colors.tsv"
auspice_config: "defaults/auspice_config.json"
auspice_config_N450: "defaults/auspice_config_N450.json"
filter:
group_by: "country year month"
filter:
group_by: "country year"
sequences_per_group: 20
min_date: 1950
min_length: 5000
Expand All @@ -20,6 +20,8 @@ filter_N450:
refine:
coalescent: "opt"
date_inference: "marginal"
clock_filter_iqd: 4
clock_filter_iqd: 4
ancestral:
inference: "joint"
export:
metadata_columns: "strain division location"
Comment on lines +26 to +27
Copy link
Contributor

@kimandrews kimandrews Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to surface the strain names now. Eventually we should be able to pull more strain names from GenBank, after NCBI Datasets starts pulling the "strain" field, which is where most measles strain names are reported on GenBank (currently we are getting strain names from Genbank's "isolate" field, which NCBI Datasets does pull). NCBI says this is planned for sometime this year. This would also enable us to recover dates for some samples that have empty dates, since dates are part of the strain name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Thanks for the context.

6 changes: 3 additions & 3 deletions phylogenetic/rules/export.smk
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ rule export:
aa_muts = "results/{gene}/aa_muts.json",
colors = config["files"]["colors"],
auspice_config = lambda wildcard: "defaults/auspice_config.json" if wildcard.gene in ["genome"] else "defaults/auspice_config_N450.json"

output:
auspice_json = "auspice/measles_{gene}.json",
root_sequence = "auspice/measles_{gene}_root-sequence.json"
params:
strain_id = config["strain_id_field"]
strain_id = config["strain_id_field"],
metadata_columns = config["export"]["metadata_columns"]
shell:
"""
augur export v2 \
Expand All @@ -29,8 +29,8 @@ rule export:
--metadata-id-columns {params.strain_id} \
--node-data {input.branch_lengths} {input.nt_muts} {input.aa_muts} \
--colors {input.colors} \
--metadata-columns {params.metadata_columns} \
--auspice-config {input.auspice_config} \
--include-root-sequence \
--output {output.auspice_json}
"""