This project, carried out as part of the Bio-Algorithmics module (Master 1 Bioinformatics, USTHB), explores the implementation and performance analysis of different methods for multiple sequence alignment (MSA). The main objective is to compare two fundamental approaches: the progressive approach (heuristic) and the iterative approach based on a genetic algorithm (SAGA).
The Comp.py
script analyzes the execution time data of the two main methods and generates the following comparative graph, demonstrating the performance difference in terms of speed between the progressive approach and SAGA.
This repository contains a collection of Python scripts, each demonstrating a key algorithm or concept:
Needelman.py
: Pairwise Alignment (Needleman-Wunsch)- Implements the classic dynamic programming algorithm for global alignment of two sequences.
- Uses the BLOSUM62 similarity matrix and a linear gap penalty.
- Generates random protein sequences for testing.
ProgressiveMultiple.py
: Progressive Multiple Alignment- An implementation of a fast heuristic method for multiple alignment.
- Progressively aligns sequences based on a profile that is updated at each step.
- Very efficient in computational time, this is an approach similar to that used by tools like ClustalW.
SAGA.py
: Genetic Algorithm Alignment (SAGA)- Implements an iterative method that uses evolutionary principles (selection, crossover, mutation) to find a high-quality alignment.
- Aims to optimize an alignment score over multiple generations.
- Although slower, this algorithm can potentially find better alignments for complex cases.
Comp.py
: Analysis and Visualization Script- Does not perform alignment.
- Takes performance data (execution time) as input and uses
matplotlib
to visualize the comparison between methods.
Each script is standalone and can be executed individually.
- Clone the repository:
git clone https://github.com/YOUR_USERNAME/MSA-Bioinformatics-Project.git cd MSA-Bioinformatics-Project
- Install dependencies:
pip install -r requirements.txt
- Run the script of your choice:
- For pairwise alignment:
python Needelman.py
- For progressive alignment:
python ProgressiveMultiple.py
- For alignment with SAGA:
python SAGA.py
- To generate the comparison graph:
python Comp.py
- For pairwise alignment:
Each script will guide you by asking for the necessary parameters (sequence size, number of sequences, etc.).
If you use this project, please cite it as:
Ayoub Laib (2025), Multiple Sequence Alignment Algorithms Implementation and Comparison, GitHub repository: https://github.com/aylaib/MSA-Bioinformatics-Project
- Complete Project Report : This document contains the detailed theoretical analysis, methodology, test results and project conclusion.
- Project Statement : The original specifications of the mini-project.