Python codes for machine learning, statistics, bioinformatics, and epidemiology
This repository contains Python scripts and notebooks for analyzing large-scale datasets, with a focus on applications in:
- Machine learning
- Statistics
- Data cleaning and structuring
- Visualization
- Bioinformatics
- Infectious diseases and epidemiology
The goal is to provide a resource for researchers, data scientists, and students working in computational biology, genomics, molecular diagnostics, and related fields.
-
Machine Learning & Modeling: Scripts for supervised and unsupervised learning, feature engineering, model evaluation, and explainability (including SHAP, feature importance, and more).
-
Bioinformatics: Tools for processing sequencing data, variant analysis, resistance gene identification, and genomics analytics.
-
Statistics & Data Cleaning: Utilities for data wrangling, normalization, deduplication, and advanced statistical analysis.
-
Epidemiological Analysis: Code for cohort analysis, prevalence and incidence calculations, interaction and association testing, and public health surveillance analytics.
-
Visualization: Publication-ready plots and multi-panel figures using Matplotlib, Seaborn, and other libraries.
- Infectious disease diagnostics and surveillance
- Antimicrobial resistance analytics
- Genomic and molecular biology data analysis
- Epidemiological modeling
- General-purpose data analytics in biomedical research
/scripts/
– Core Python scripts and utility functions/notebooks/
– Example Jupyter notebooks and analyses/data/
– Sample or test datasets (de-identified or simulated)/figures/
– Example output plots and visualization templates
Folders will be updated as the repository grows.
-
Clone this repository:
git clone https://github.com/joseiky/Data-analytics.git cd Data-analytics
-
Install required dependencies: Most scripts require
pandas
,numpy
,scipy
,scikit-learn
,matplotlib
,seaborn
, andjupyter
. You can install them using:pip install -r requirements.txt
(A sample
requirements.txt
will be provided soon.) -
Run scripts or notebooks:
- Navigate to the relevant folder
- Open Jupyter notebooks or run
.py
files as needed
This repository is licensed under the MIT License. See LICENSE for details.
Created and maintained by Dr. John Osei, Extraordinary Professor, Medical Microbiology & Bioinformatics Contact: jod14139@yahoo.com
machine-learning
• bioinformatics
• genomics
• statistics
• epidemiology
• infectious-diseases
• antibiotic-resistance
• visualization
• data-cleaning
Feel free to contribute, open issues, or fork the repository!