Skip to content

Latest commit

 

History

History
151 lines (100 loc) · 7.15 KB

README.md

File metadata and controls

151 lines (100 loc) · 7.15 KB

logo

Synteny-aware hmm searches made easy

tests codecov docs

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Anaconda-Server Badge license Contributor Covenant

Bioconda Anaconda-Server Badge GitHub release

Anaconda-Server Badge python Code style: black

pyOpenSci DOI

1. 💡 What is Pynteny?

Pynteny is Python tool to search for synteny blocks in (prokaryotic) sequence data through HMMs of the ORFs of interest and HMMER. By leveraging genomic context information, Pynteny can be employed to decrease the uncertainty of functional annotation of unlabelled sequence data due to the effect of paralogs. Pynteny can be accessed (i) through the command line or (ii) as a Python module.

Get more info in the documentation pages!

Check out the Pynteny paper in the Journal of Open Source Software!

2. 🔧 Setup

Install with conda:

  1. Pynteny requires Python 3.10. The easiest way to handle dependencies is by creating a dedicated conda environment:
conda create -n pynteny -c bioconda -c conda-forge python=3.10 pynteny
conda activate pynteny
  1. Check that installation worked fine:
(pynteny) pynteny --help

2.1. Installing on Windows

Pynteny is designed to run on Linux machines. However, it can be installed within the Windows Subsystem for Linux via conda.

2.2. Installing on MacOS with the latest ARM64 architecture

Pynteny doesn't currently support the latest ARM64 architecture of silicon processors (e.g. MacBook M1 and M2). If that is your case, you can install Pynteny using the workaround below (based on this post):

CONDA_SUBDIR=osx-64 conda create -n pynteny_x86 python=3.10
conda activate pynteny_x86
conda config --env --set subdir osx-64
conda install -c bioconda pynteny

3. 🚀 Usage

Consider the following toy example of a syntenic block:

synteny example

Here, we are interested in four genes which colocate according to the pattern above: genes A-C show consecutive locations in the positive strand, followed by three (untargeted) genes and followed by gene D, which is located in the negative strand.

Pynteny can be run either as a command line tool or as a Python module. To run pynteny in the command line, execute:

conda activate pynteny
pynteny <subcommand> <options>

pynyeny-cli

There are a number of available subcommands, which can be explored in the documentation pages.

For intance, to first download the PGAP's database containing a collection of profile HMMs as well as metadata:

pynteny download --outdir data/hmms --unpack

Next, to build a labelled peptide database from DNA assembly data:

pynteny build \
    --data assembly.fa \
    --outfile labelled_peptides.faa

Finally, to search the peptide database for the syntenic structure displayed above: >gene_A 0 >gene_B 0 >gene_C 3 <gene_D, and using the downloaded PGAP database:

pynteny search \
    --synteny_struc ">gene_A 0 >gene_B 0 >gene_C 3 <gene_D" \
    --data labelled_peptides.faa \
    --outdir results/ \
    --gene_ids

4. 📔 Examples

Here are some Jupyter Notebooks with examples to show how Pynteny works:

You can find more notebooks in the examples directory. Find more info in the documentation.

5. 🔄 Dependencies

Pynteny would not work without these awesome projects:

Thanks!

6. :octocat: Contributing

Contributions are always welcome! If you don't know where to start, you may find an interesting issue to work in here. Please, read our contribution guidelines first.

7. ✒️ Citation

If you use this software, please cite it as below:

Semidán Robaina Estévez. (2023). Pynteny: synteny-aware hmm searches made easy (Version 1.0.0). Zenodo. https://zenodo.org/record/7696204