Multi-camera computer vision system for 3D object detection, triangulation and tracking.

Multi-camera object with 3D reconstruction, 3D tracking, and visualization
Click above to watch the full system demonstration on YouTube
This project implements a multi-camera tracking system that combines YOLO object detection, epipolar geometry-based matching, triangulation and 3D tracking to provide 3D object localization and trajectory tracking. The system is designed for research and applications requiring spatial awareness across multiple synchronized camera views.
- Multi-Camera Synchronization: Processes 2 or more synchronized camera feeds simultaneously
- YOLO Integration: Object detection with YOLOv11 support
- Epipolar Geometry Matching: Cross-view correspondence using fundamental matrices
- 3D Triangulation: RANSAC-based triangulation with outlier rejection
- 3D Tracking: 3D algorithm for consistent 3D object tracking
- Visualization: Multi-view display with 3D plotting and detection graphs
- Multiple Reference Points: Multiple bounding box reference points for different tracking scenarios
- Output: JSON coordinate export, video recording, and figure export
graph TD
A[Camera 0-3<br/>Video Feeds] --> B[YOLO Detection<br/>& Tracking]
B --> C[Cross-view<br/>Matching]
C --> D[3D<br/>Triangulation]
D --> E{3D Tracking<br/>Enabled?}
E -->|Yes| F[3D<br/>Tracking]
E -->|No| G[Raw 3D<br/>Coordinates]
F --> H[Visualization<br/>& Export]
G --> H
H --> I[Multi-camera<br/>Video Mosaic]
H --> J[Detection<br/>Graph Display]
H --> K[3D Position<br/>Plot]
H --> L[JSON<br/>Output]
H --> M[Video<br/>Recording]
style A fill:#e1f5fe
style B fill:#f3e5f5
style C fill:#e8f5e8
style D fill:#fff3e0
style E fill:#fce4ec
style F fill:#e0f2f1
style G fill:#e0f2f1
style H fill:#f1f8e9
The system has been tested on various scenarios demonstrating its multi-object tracking and 3D reconstruction capabilities.
- Python 3.8 or higher
- CUDA-compatible GPU (recommended for YOLO inference)
- OpenCV 4.5+
- Camera calibration files in JSON format
Install the required packages:
pip install -r requirements.txt
matplotlib==3.9.3
- Plotting and visualizationnetworkx==3.4.2
- Graph-based detection managementnumpy==1.24.0
- Numerical computingopencv-python-headless==4.10.0.84
- Computer vision operationsultralytics==8.3.40
- YOLO object detectionlap
- Linear assignment problem solver
Multi-Object-Triangulation_and_3D_Footprint_Tracking/
├── source/ # Core system modules
│ ├── main.py # Main execution script
│ ├── config.py # Configuration settings
│ ├── detection.py # Object detection data structures
│ ├── video_loader.py # Multi-camera video handling
│ ├── tracker.py # YOLO tracking wrapper
│ ├── matcher.py # Cross-view matching algorithm
│ ├── triangulation.py # 3D reconstruction methods
│ ├── three_dimentional_tracker.py # 3D tracking
│ ├── epipolar_utils.py # Epipolar geometry utilities
│ ├── load_fundamental_matrices.py # Camera calibration loading
│ ├── visualization_utils.py # Visualization helpers
│ ├── graph_visualization.py # Detection graph display
│ ├── io_utils.py # Input/output operations
│ └── ploting_utils.py # Plotting utilities
├── config_camera/ # Camera calibration files
│ ├── 0.json # Camera 0 parameters
│ ├── 1.json # Camera 1 parameters
│ ├── 2.json # Camera 2 parameters
│ └── 3.json # Camera 3 parameters
├── models/ # YOLO model weights
│ ├── yolo11x.pt # General object detection
├── videos/ # Input video directory
├── tools/ # Analysis and utility tools
├── experiments/ # Experimental data and results
├── figures/ # Generated visualization outputs
└── docs/ # Documentation
For proper operation, ensure your video files follow this naming convention:
videos/
├── cam0.mp4 # Camera 0 video
├── cam1.mp4 # Camera 1 video
├── cam2.mp4 # Camera 2 video
└── cam3.mp4 # Camera 3 video
# Run with default settings (all visualizations enabled)
python source/main.py --video_path videos
# Run with 3D tracking
python source/main.py --video_path videos --use_3d_tracker
# Save 3D coordinates to JSON file
python source/main.py --video_path videos --save_coordinates --output_file tracking_results.json
# Custom YOLO model and confidence threshold
python source/main.py --video_path videos --yolo_model models/custom_model.pt --confidence 0.8
# Track multiple object classes (person=0, car=2, bicycle=1)
python source/main.py --video_path videos --class_list 0 1 2
# Use feet reference point for better ground plane tracking
python source/main.py --video_path videos --reference_point feet --use_3d_tracker
# 3D tracking with custom parameters
python source/main.py --video_path videos --use_3d_tracker --max_age 15 --min_hits 2 --dist_threshold 0.8
# Headless processing (no visualization)
python source/main.py --video_path videos --headless --save_coordinates
# Export figures
python source/main.py --video_path videos --export_figures --export_dpi 600
Argument | Type | Default | Description |
---|---|---|---|
--video_path |
str | "videos" | Path to video files directory |
--output_file |
str | "output.json" | Output JSON file for 3D coordinates |
--save_coordinates |
flag | False | Save 3D coordinates to JSON file |
--use_3d_tracker |
flag | False | Enable 3D tracking algorithm |
Argument | Type | Default | Description |
---|---|---|---|
--yolo_model |
str | "models/yolo11x.pt" | Path to YOLO model file |
--confidence |
float | 0.6 | Detection confidence threshold |
--class_list |
int[] | [0] | Object classes to track (0=person) |
Argument | Type | Default | Description |
---|---|---|---|
--max_age |
int | 10 | Maximum frames without detection |
--min_hits |
int | 3 | Minimum detections before tracking |
--dist_threshold |
float | 1.0 | Maximum association distance (m) |
Argument | Type | Default | Description |
---|---|---|---|
--distance_threshold |
float | 0.4 | Epipolar distance threshold |
--drift_threshold |
float | 0.4 | Ambiguous match threshold |
Option | Description | Use Case |
---|---|---|
bottom_center |
Bottom center of bbox | General tracking |
center |
Geometric center | Object center tracking |
top_center |
Top center of bbox | Head tracking |
feet |
20% above bottom center | Human foot tracking |
Argument | Description |
---|---|
--headless |
Disable all visualization |
--no-graph |
Disable detection graph |
--no-3d |
Disable 3D plot |
--no-video |
Disable video mosaic |
--save-video |
Save output video |
--export_figures |
Export final plots |
The system uses epipolar geometry to establish correspondences between object detections across different camera views:
- Fundamental Matrix Computation: Automatic calculation from camera calibration parameters
- Epipolar Line Generation: Computation of constraint lines for each detection
- Cross-Distance Measurement: Normalized distance metric between detections and epipolar lines
- Temporal Consistency: Historical match information for quite robust tracking
- Ambiguity Resolution: Drift threshold handling for conflicting matches
The triangulation process employs these methods to ensure accurate 3D positioning:
- Linear Triangulation: Direct Linear Transform (DLT) algorithm
- RANSAC Implementation: Iterative outlier rejection for noisy data
- Reprojection Error Validation: Quality control through back-projection
- Reference Point Selection: bbox-to-3D mapping strategies
3D tracking algorithm for 3D space with features:
- Kalman Filter State Model: 6D state vector [x, y, z, dx, dy, dz]
- Hungarian Algorithm: Optimal detection-to-track assignment
- Track Lifecycle Management: Birth, maintenance, and death handling
- Trajectory Storage: Complete path history for visualization
- Class Consistency: Object type validation across frames
NetworkX-based graph representation for managing detection relationships:
- Node Representation: Individual detections with metadata
- Edge Creation: Matched detection pairs across views
- Connected Components: Groups of matched detections for triangulation
- Graph Visualization: Network display with color coding
{
"timestamp": "2025-01-01 12:00:00",
"frame": 42,
"points": [
{
"position": [1.23, 2.45, 0.89],
"id": 1,
"class": 0
}
]
}
The tools/
directory contains utilities for:
- Robot trajectory comparison and validation
- Odometry vs. camera tracking analysis
- Circular reference trajectory validation
- 3D tracking accuracy analysis
- Grid-based ground truth comparison
- ArUco marker reconstruction validation
- Multi-camera detection performance
- Grid accuracy validation
- Camera combination analysis
- Multi-camera video composition
- Reference grid visualization
- Trajectory plotting utilities
- Parallel video processing across cameras
- Epipolar geometry calculations
- Graph operations with NetworkX
- Memory-controlled trajectory storage
- Camera calibration quality is crucial
- Higher detection confidence reduces false positives
- Balanced matching thresholds for precision/recall
- RANSAC parameters affect computation vs. accuracy trade-off
- Use GPU for YOLO inference (
device="cuda:0"
) - Adjust visualization complexity based on hardware
- Consider headless mode for maximum throughput
- Optimize video resolution for your use case
The system requires camera calibration files in JSON format containing:
- Intrinsic Parameters: Camera matrix and distortion coefficients
- Extrinsic Parameters: Rotation and translation matrices
- Resolution Information: Image dimensions
- Calibration Metadata: Timestamp and error metrics
Example calibration file structure:
{
"calibratedAt": "2025-04-04T14:17:05.558577Z",
"error": 0.2824207223298144,
"resolution": {"height": 728, "width": 1288},
"intrinsic": {...},
"extrinsic": {...},
"distortion": {...}
}
No detections found:
- Check YOLO model compatibility
- Adjust confidence threshold (
--confidence
) - Verify video file formats and paths
Poor 3D reconstruction:
- Validate camera calibration files
- Check camera synchronization
- Adjust matching thresholds (
--distance_threshold
,--drift_threshold
)
Performance issues:
- Enable GPU for YOLO inference
- Reduce visualization complexity
- Use headless mode for processing
- Check available system memory
Tracking inconsistencies:
- Tune 3D tracking parameters (
--max_age
,--min_hits
,--dist_threshold
) - Verify temporal consistency in input videos
- Check object class configuration
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this work in your research, please cite:
- YOLO implementation by Ultralytics
- algorithm by Alex Bewley et al.
- OpenCV and NetworkX communities
- Camera calibration tools and methodologies
- SORT: Simple, Online, and Realtime Tracking
- YOLOv11: Real-Time Object Detection
- Multiple View Geometry in Computer Vision