An advanced AI-powered system for automated segmentation and volume estimation of fragments in images, featuring dual processing modes (RGB and RGB-D) with user authentication and a modern web interface.
you can also check the SLIDE for more technical details.
Some thoughts:
- Thanks GDGoC for the opportunity to participate in this competition. I learned a lot from this contest!
- I'm grateful for the support from the GDGoC team, especially Mentor Duc for his guidance and support.
- Thanks Bui Huy Giap, Duy Hung and Nam Anh for 3 month collaboration!
This repository consists of two main components that work together to provide a complete fragment segmentation solution:
- Segmentation Server (
/server
): A production-ready FastAPI backend with JWT authentication, OAuth support, and dual segmentation modes - Model Training (
/model
): Comprehensive training pipeline for YOLOv8 segmentation models and depth estimation integration
- JWT-based Authentication with secure token management
- OAuth Integration (Google Sign-In support)
- User Registration & Login with email validation
- Password Security using bcrypt hashing
- Session Management with token expiration
- Dual Segmentation Modes:
- Fast Mode (RGB): Lightning-fast processing using optimized YOLO ONNX models
- Precise Mode (RGB-D): Enhanced accuracy with depth information via Depth Anything V2
- Volume Estimation: Advanced 3D volume calculation using point cloud analysis
- Real-time Processing: Efficient ONNX runtime inference
- Multiple Format Support: JPEG, PNG image processing
- Responsive Design: Beautiful, mobile-friendly user interface
- Real-time Feedback: Live processing status and results display
- Drag & Drop Upload: Intuitive file upload experience
- Result Visualization: Interactive display of segmented images and volume data
Backend Technologies:
- FastAPI (Modern Python web framework)
- SQLAlchemy (Database ORM)
- JWT & OAuth (Authentication)
- ONNX Runtime (Model inference)
- OpenCV (Image processing)
- PyTorch (Deep learning)
Frontend Technologies:
- HTML5/CSS3/JavaScript
- Modern UI/UX design patterns
- Responsive layout system
AI/ML Technologies:
- YOLOv8 Segmentation (RGB & RGB-D variants)
- Depth Anything V2 (Monocular depth estimation)
- Point Cloud Processing (3D volume calculation)
- ONNX Model Optimization
├── server/ # Production FastAPI server
│ ├── app.py # Main application entry point
│ ├── model.py # Core segmentation logic
│ ├── config.py # Configuration management
│ ├── auth/ # Authentication system
│ ├── frontend/ # Web interface files
│ ├── utils/ # Utility modules
│ ├── docker/ # Docker deployment files
│ └── weights/ # Pre-trained model weights
├── model/ # Training pipeline
│ ├── *.ipynb # Jupyter training notebooks
│ ├── script/ # Training scripts
│ └── depth_estimation/ # Depth model utilities
└── README.md # This file
- Python 3.9+
- Docker & Docker Compose (recommended)
- Model weights (see setup instructions)
-
Clone the repository:
git clone <repository-url> cd <this repository>
-
Download model weights:
-
Download from: EnEoPi Weights
-
Create weights directory and place files:
mkdir -p server/weights
-
Place downloaded files in
server/weights/
:yolo_rgb_nano.onnx
yolo_rgbd_nano.onnx
depth_anything_v2_vits.pth
-
-
Configure environment:
cd server cp .env_example .env # Edit .env file with your configuration
-
Deploy with Docker:
cd server/docker docker-compose up -d --build
-
Access the application:
- Open http://localhost:3000
- Register a new account or sign in
- Start segmenting images!
cd server
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python app.py
For detailed information on each component:
- Server Documentation: Complete setup, API reference, authentication, and deployment guide
- Model Training Guide: Training procedures, model architectures, and reproduction instructions
POST /auth/register
- User registrationPOST /auth/login
- User loginGET /auth/me
- Get current user infoPOST /auth/oauth/google
- Google OAuth login
GET /
- Web interface (requires authentication)POST /predict
- Image segmentation endpoint- Parameters:
file
(image),use_depth
("fast"/"precise") - Returns: Segmented image, volumes, processing time
- Parameters:
We welcome contributions! Please see our contributing guidelines and feel free to submit issues or pull requests.