Data-analytics

Python codes for machine learning, statistics, bioinformatics, and epidemiology

Overview

This repository contains Python scripts and notebooks for analyzing large-scale datasets, with a focus on applications in:

Machine learning
Statistics
Data cleaning and structuring
Visualization
Bioinformatics
Infectious diseases and epidemiology

The goal is to provide a resource for researchers, data scientists, and students working in computational biology, genomics, molecular diagnostics, and related fields.

Features

Machine Learning & Modeling: Scripts for supervised and unsupervised learning, feature engineering, model evaluation, and explainability (including SHAP, feature importance, and more).
Bioinformatics: Tools for processing sequencing data, variant analysis, resistance gene identification, and genomics analytics.
Statistics & Data Cleaning: Utilities for data wrangling, normalization, deduplication, and advanced statistical analysis.
Epidemiological Analysis: Code for cohort analysis, prevalence and incidence calculations, interaction and association testing, and public health surveillance analytics.
Visualization: Publication-ready plots and multi-panel figures using Matplotlib, Seaborn, and other libraries.

Use Cases

Infectious disease diagnostics and surveillance
Antimicrobial resistance analytics
Genomic and molecular biology data analysis
Epidemiological modeling
General-purpose data analytics in biomedical research

Repository Structure

/scripts/ – Core Python scripts and utility functions
/notebooks/ – Example Jupyter notebooks and analyses
/data/ – Sample or test datasets (de-identified or simulated)
/figures/ – Example output plots and visualization templates

Folders will be updated as the repository grows.

Getting Started

Clone this repository:

git clone https://github.com/joseiky/Data-analytics.git
cd Data-analytics

Install required dependencies: Most scripts require pandas, numpy, scipy, scikit-learn, matplotlib, seaborn, and jupyter. You can install them using:
```
pip install -r requirements.txt
```
(A sample requirements.txt will be provided soon.)
Run scripts or notebooks:
- Navigate to the relevant folder
- Open Jupyter notebooks or run .py files as needed

License

This repository is licensed under the MIT License. See LICENSE for details.

About

Created and maintained by Dr. John Osei, Extraordinary Professor, Medical Microbiology & Bioinformatics Contact: jod14139@yahoo.com

Keywords

machine-learning • bioinformatics • genomics • statistics • epidemiology • infectious-diseases • antibiotic-resistance • visualization • data-cleaning

Feel free to contribute, open issues, or fork the repository!

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
Candida-infections		Candida-infections
Cystic_Fibrosis-article		Cystic_Fibrosis-article
Gastrointestinal-infections-article		Gastrointestinal-infections-article
HPV-BV-Interactions_Project		HPV-BV-Interactions_Project
MRSA-MSSA-article		MRSA-MSSA-article
MSc-Masego-Project		MSc-Masego-Project
De-identify.py		De-identify.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data-analytics

Overview

Features

Use Cases

Repository Structure

Getting Started

License

About

Keywords

About

Uh oh!

Releases

Packages

Languages

License

joseiky/Data-analytics

Folders and files

Latest commit

History

Repository files navigation

Data-analytics

Overview

Features

Use Cases

Repository Structure

Getting Started

License

About

Keywords

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages