A state-of-the-art deepfake detection system built with PyTorch and EfficientNet-B0, featuring a user-friendly web interface for real-time image and video analysis.
- Deep Learning Model: EfficientNet-B0 architecture fine-tuned for deepfake detection
- Multi-format Support: Analyze both images (.jpg, .jpeg, .png) and videos (.mp4, .mov)
- Web Interface: Interactive Gradio-based web application for easy testing
- Real-time Analysis: Process first frame of videos for quick deepfake detection
- Training Pipeline: Complete PyTorch Lightning training infrastructure
- Model Export: Support for PyTorch (.pt) and ONNX format exports
- Python 3.8 or higher
- CUDA-compatible GPU (optional, but recommended for training)
-
Clone the repository:
git clone https://github.com/Macherla-Mallikarjun/deepfake-detection.git cd deepfake-detection
-
Install dependencies:
pip install -r requirements.txt
-
Download a pre-trained model (or train your own):
- Place your model file as
models/best_model-v3.pt
- Place your model file as
Launch the interactive web interface:
python web-app.py
The web app will open in your browser where you can:
- Drag and drop images or videos
- View real-time predictions with confidence scores
- See preview of analyzed content
Classify individual images:
python classify.py --image path/to/your/image.jpg
Process videos frame by frame:
python inference/video_inference.py --video path/to/your/video.mp4
This deepfake detection system supports various popular deepfake datasets. Below are the recommended datasets for training and evaluation:
- Description: One of the most comprehensive deepfake datasets with 4 manipulation methods
- Size: ~1,000 original videos, ~4,000 manipulated videos
- Manipulations: Deepfakes, Face2Face, FaceSwap, NeuralTextures
- Quality: Raw, c23 (light compression), c40 (heavy compression)
- Download: GitHub Repository
- Usage: Excellent for training robust models across different manipulation types
- Description: High-quality celebrity deepfake dataset
- Size: 590 real videos, 5,639 deepfake videos
- Quality: High-resolution with improved visual quality
- Download: Official Website
- Usage: Great for testing model performance on high-quality deepfakes
- Description: Facebook's large-scale deepfake detection dataset
- Size: ~100,000 videos (real and fake)
- Diversity: Multiple actors, ethnicities, and ages
- Download: Kaggle Competition
- Usage: Large-scale training and benchmarking
- Description: Google/Jigsaw deepfake dataset
- Size: ~3,000 deepfake videos
- Quality: High-quality with various compression levels
- Download: FaceForensics++ repository
- Usage: Additional training data for model robustness
- Description: Large collection of real and AI-generated face images
- Size: ~140,000 images
- Source: StyleGAN-generated faces vs real faces
- Download: Kaggle Dataset
- Usage: Perfect for image-based deepfake detection training
- Description: High-quality celebrity face dataset
- Size: 30,000 high-resolution images
- Quality: 1024Γ1024 resolution
- Download: GitHub Repository
- Usage: Real face examples for training
- Download your chosen dataset from the links above
- Extract to the
data/
folder - Organize as shown in the training section below
Use our built-in tools to prepare datasets:
# Split video dataset into frames
python tools/split_video_dataset.py --input_dir raw_videos --output_dir data
# Split dataset into train/validation
python tools/split_train_val.py --input_dir data --train_ratio 0.8
# General dataset splitting
python tools/split_dataset.py --input_dir your_dataset --output_dir data
- For Beginners: Start with 140k Real and Fake Faces (image-based, easy to work with)
- For Research: Use FaceForensics++ (comprehensive, multiple manipulation types)
- For Production: Combine DFDC + Celeb-DF (large scale, diverse)
- For High-Quality Testing: Use Celeb-DF v2 (challenging, high-quality deepfakes)
- Ethical Use: These datasets are for research purposes only
- Legal Compliance: Ensure compliance with dataset licenses and terms of use
- Privacy: Respect privacy rights of individuals in the datasets
- Citation: Properly cite the original dataset papers when publishing research
Organize your training data in the data
folder as follows:
data/
βββ train/
β βββ real/
β β βββ image1.jpg
β β βββ image2.jpg
β βββ fake/
β βββ fake1.jpg
β βββ fake2.jpg
βββ validation/
βββ real/
βββ fake/
Update config.yaml
with your dataset paths:
train_paths:
- data/train
val_paths:
- data/validation
lr: 0.0001
batch_size: 4
num_epochs: 10
python main_trainer.py
or
python model_trainer.py
The training will:
- Use PyTorch Lightning for efficient training
- Save best model based on validation loss
- Log metrics to TensorBoard
- Apply early stopping to prevent overfitting
View training progress with TensorBoard:
tensorboard --logdir lightning_logs
βββ web-app.py # Main web application
βββ main_trainer.py # Primary training script
βββ classify.py # Image classification utility
βββ realeval.py # Real-world evaluation script
βββ config.yaml # Training configuration
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ LICENSE # MIT License
βββ .gitignore # Git ignore rules
βββ data/ # Dataset storage (not tracked by git)
β βββ train/ # Training data
β βββ validation/ # Validation data
βββ datasets/
β βββ hybrid_loader.py # Custom dataset loader
βββ lightning_modules/
β βββ detector.py # PyTorch Lightning module
βββ models/
β βββ best_model-v3.pt # Trained model weights
βββ tools/ # Dataset preparation utilities
β βββ split_dataset.py
β βββ split_train_val.py
β βββ split_video_dataset.py
βββ inference/
βββ export_onnx.py # ONNX export
βββ video_inference.py # Video processing
- Backbone: EfficientNet-B0 (pre-trained on ImageNet)
- Classifier: Custom 2-class classifier with dropout (0.4)
- Input Size: 224x224 RGB images
- Output: Binary classification (Real/Fake) with confidence scores
The model achieves:
- High accuracy on diverse deepfake datasets
- Real-time inference capabilities
- Robust performance on compressed/low-quality media
Convert PyTorch model to ONNX format:
python inference/export_onnx.py
Process multiple files programmatically:
from web-app import predict_file
results = []
for file_path in image_paths:
prediction, confidence, preview = predict_file(file_path)
results.append({
'file': file_path,
'prediction': prediction,
'confidence': confidence
})
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- EfficientNet architecture by Google Research
- PyTorch Lightning for training infrastructure
- Gradio for web interface framework
- The research community for deepfake detection advances
β Star this repository if you found it helpful!