Skip to content

giffordlabcvr/Dengue-GLUE

Repository files navigation

Dengue-GLUE: Phylogenomic Analysis of Dengue Virus

Welcome to the GitHub repository for Dengue-GLUE!

Dengue-GLUE is a sequence-oriented resource for comparative genomic analysis of dengue virus (DENV), developed using the GLUE software framework.

GLUE is an open, integrated software toolkit designed for storing and interpreting sequence data. It supports the creation of bespoke projects, incorporating essential data items for comparative genomic analysis, such as sequences, multiple sequence alignments, genome feature annotations, and other associated data.

Projects are loaded into the GLUE "engine," forming a relational database that represents the semantic relationships between data items. This foundation supports systematic comparative analyses and the development of sequence-based resources.

Dengue-GLUE contains DENV feature definitions, alignments, and reference sequences for all DENV serotypes and genotypes, as well as all major and minor lineages defined here.

This Dengue-GLUE base project can be extended with additional layers, openly available via GitHub. These include NCBI-Dengue-GLUE: which extends Dengue-GLUE through the incorporation of all DENV sequence data published in NCBI GenBank.

Table of Contents

Key Features

  • GLUE Framework Integration: Built on the GLUE software framework, Dengue-GLUE offers an extensible platform for efficient, standardized, and reproducible computational genomic analysis of dengue virus.

  • Phylogenetic Structure: Sequence data in Dengue-GLUE is organized in a phylogenetically-structured manner, allowing users to explore evolutionary relationships easily.

  • Rich Annotations: Annotated reference sequences enable rigorous comparative genomic analysis related to conservation, adaptation, structural context, and genotype-to-phenotype associations.

  • Automated Genotyping: Dengue-GLUE can perform automated genotyping of DENV sequences (including subgenomic sequences) following recently proposed lineage nomenclature, via GLUE's maximum likelihood clade assignment (MLCA) algorithm.

  • Exploratory and operational: The GLUE framework allows sequence-based resources to be used for exploratory work in a research setting, as well as operational work in a public or animal health setting.

Installation

If you have not done so already, install the GLUE software framework by following the installation instructions on the GLUE web site:

Download the Dengue-GLUE repository, navigate into the top-level directory, and start the GLUE command line interpreter.

At the GLUE command prompt, run the 'buildDengueProject.glue' file as follows:

GLUE Version 1.1.107
Copyright (C) 2015-2020 The University of Glasgow
This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. For details see
GNU Affero General Public License v3: http://www.gnu.org/licenses/

Mode path: /
GLUE> run file buildDengueProject.glue

This will build the base project, which contains a minimal set of feature definitions, clade categories, reference sequences, and alignments.

Usage

GLUE contains an interactive command line environment focused on the development and use of GLUE projects by bioinformaticians. This provides a range of productivity-oriented features such as automatic command completion, command history and interactive paging through tabular data.

For detailed instructions on how to use Dengue-GLUE for your comparative genomic analysis, refer to the GLUE's reference documentation.

Genotyping

To classify DENV1, DENV2, DENV3 or DENV4 sequences via maximum likelihood clade assignment (MLCA)-based genotyping, call the appropriate genotyping module from the GLUE console, for example:

Mode path: /
GLUE> project dengue
OK
Mode path: /project/dengue
GLUE> module denv2MaxLikelihoodGenotyper genotype file -f path/to/sequences/denv2.fasta 

Please note the above command sequence is equivalent to the following:

Mode path: /
GLUE> project dengue module denv2MaxLikelihoodGenotyper genotype file -f path/to/sequences/denv2.fasta 

If you need to first determine the serotypes of your input sequences, use Dengue-GLUE's BLAST-based serotype recogniser module as shown:

Mode path: /project/dengue
GLUE> module dengueSerotypeRecogniser recognise file -i path/to/sequences/denv.fasta 

Mutation frequencies

Dengue-GLUE can provide a detailed frequency distribution of amino acids at a specific position within a DENV coding feature, based on the alignments contained with the project. This can enable insights into the variability and conservation of DENV proteins.

The following command in the GLUE console calculates the amino acid frequencies at a specific position within a feature of a dengue virus type 2 (DENV2) alignment:

Mode path: /project/dengue
GLUE> alignment AL_DENV_2 amino-acid frequency -c -r REF_MASTER_DENV2 -f envelope -l 52 52

This command produces results in a tabular output format like this:

+==========+=======+===========+============+====================+
| feature  | codon | aminoAcid | numMembers |     pctMembers     |
+==========+=======+===========+============+====================+
| envelope | 52    | E         | 4          | 1.3840830449826989 |
| envelope | 52    | H         | 89         | 30.79584775086505  |
| envelope | 52    | K         | 1          | 0.3460207612456747 |
| envelope | 52    | Q         | 190        | 65.7439446366782   |
| envelope | 52    | Y         | 5          | 1.7301038062283738 |
+==========+=======+===========+============+====================+

Command Breakdown

alignment AL_DENV_2 Selects the alignment for dengue virus type 2 (DENV2).

amino-acid frequency Instructs GLUE to calculate amino acid frequencies within the selected alignment.

-c Calculates the frequencies recursively, including all sub-alignments of DENV2. In Dengue-GLUE, alignments are arranged hierarchically to reflect evolutionary relationships.

-r REF_MASTER_DENV2 Specifies the constraining reference sequence, REF_MASTER_DENV2, which defines the coordinate space and the wild type amino acid for comparison.

-f envelope Indicates the coding feature within the reference sequence (envelope) where amino acid frequencies will be calculated.

-l 52 52 Specifies the location within the feature to be analyzed. Here, it focuses on a single amino acid position (52).

Interpretation of the Output

  • feature: The coding feature analyzed (in this case, envelope).
  • codon: The codon position within the feature (position 52).
  • aminoAcid: The amino acid found at the specified codon position.
  • numMembers: The number of sequences in the alignment containing the specified amino acid at the given position.
  • pctMembers: The percentage of sequences in the alignment containing the specified amino acid at the given position.

Data Sources

Dengue-GLUE relies on the following data sources:

Contributing

We welcome contributions from the community! If you're interested in contributing to Dengue-GLUE, please review our Contribution Guidelines.

Contributor Covenant

License

The project is licensed under the GNU Affero General Public License v. 3.0

Contact

For questions, issues, or feedback, please open an issue on the GitHub repository.

About

A GLUE project for dengue virus (DENV)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published