Skip to content

Commit

Permalink
Add files to Nextclade Dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
kimandrews committed May 22, 2024
1 parent 751cb5c commit bba4969
Show file tree
Hide file tree
Showing 4 changed files with 1,332 additions and 13 deletions.
3 changes: 3 additions & 0 deletions nextclade_dataset/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Unreleased

Initial release.
29 changes: 29 additions & 0 deletions nextclade_dataset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Measles dataset

| Key | Value |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| authors | [Nextstrain](https://nextstrain.org) |
| reference | NC_001498.1 |
| workflow | https://github.com/nextstrain/measles/tree/main/nextclade |
| path | `nextstrain/measles` |


## Scope of this dataset

This dataset assigns genotypes to measles samples based on [criteria outlined by the WHO](https://www.who.int/publications/i/item/WER8709).

The WHO has defined 24 measles genotypes based on N gene and H gene sequences from 28 reference strains. For new measles samples, genotypes can be assigned based on genetic similarity to the reference strains at the "N450" region (a 450 bp region of the N gene).

The tree used in this dataset includes N450 sequences for the 28 reference strains, along with other representative strains for each genotype.

## Features

This dataset supports:

- Assignment of genotypes
- Minimal sequence QC
- Phylogenetic placement

## What are Nextclade datasets

Read more about Nextclade datasets in the Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
63 changes: 50 additions & 13 deletions nextclade_dataset/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,15 +1,52 @@
{
"files": {
"reference": "measles_reference_N450.fasta",
"pathogenJson": "pathogen.json",
"genomeAnnotation": "measles_reference_N450.gff3",
"treeJson": "measles_nextclade.json"
},
"attributes": {
"name": "Measles"
},
"schemaVersion": "3.0.0",
"version": {
"tag": "unreleased"
}
"files": {
"reference": "measles_reference_N450.fasta",
"pathogenJson": "pathogen.json",
"genomeAnnotation": "measles_reference_N450.gff3",
"treeJson": "measles_nextclade.json",
"examples": "sequences.fasta",
"readme": "README.md",
"changelog": "CHANGELOG.md"
},
"attributes": {
"name": "Measles (N450)",
"reference name": "Ichinose-B95a",
"reference accession": "NC_001498.1"
},
"schemaVersion": "1.0.0",
"alignmentParams": {
"minSeedCover": 0.01,
"minLength": 400
},
"qc": {
"missingData": {
"enabled": true,
"missingDataThreshold": 20,
"scoreBias": 4
},
"mixedSites": {
"enabled": true,
"mixedSitesThreshold": 4
},
"frameShifts": {
"enabled": true
},
"stopCodons": {
"enabled": true
},
"privateMutations": {
"enabled": true,
"cutoff": 8,
"typical": 2,
"weightLabeledSubstitutions": 2,
"weightReversionSubstitutions": 1,
"weightUnlabeledSubstitutions": 1
},
"snpClusters": {
"enabled": true,
"clusterCutOff": 3,
"scoreWeight": 50,
"windowSize": 50
}
}
}
Loading

0 comments on commit bba4969

Please sign in to comment.