A powerful Retrieval-Augmented Generation (RAG) system for processing multiple PDFs and delivering accurate, context-aware responses to user queries. Built in Python, it features GPU-accelerated embeddings, a semantic search engine, and the Mistral model via Ollama, wrapped in an intuitive Streamlit interface.
- 📄 Multi-PDF Support – Upload and process several PDFs at once.
- ⚡ GPU-Accelerated Embeddings – Leverage NVIDIA GPUs for fast processing.
- 🔎 Semantic Search – Uses FAISS for efficient context retrieval with source tracking.
- 🌐 Streamlit Interface – Clean and interactive web UI for document upload, chat, and performance.
- 🧩 Highly Configurable – Adjust chunk size, overlap, batch size, and worker threads.
- 📊 Real-time Metrics – Monitor chunking speed, memory usage, and system stats.
- 💬 Chat History Export – Save and download all chat sessions in JSON format.
Field | Details |
---|---|
Repo Name | Advanced-RAG-Chatbot |
License | MIT |
Language | Python |
Model | Mistral via Ollama |
Embeddings | intfloat/e5-small (HuggingFace) |
Interface | Streamlit |
Vector Store | FAISS |
PDF Processor | PyMuPDF |
Status | 🚧 Actively maintained |
Type | Minimum | Recommended |
---|---|---|
CPU | 4-core | 8-core |
RAM | 8 GB | 16 GB |
Storage | 10 GB | 20 GB |
GPU (Optional) | ❌ | ✅ NVIDIA (GTX 1060 or higher) |
- Python 3.8+
- Git
- CUDA Toolkit 11.2+ (for GPU acceleration)
- Ollama (≥ 0.1.0)
# Download Python from https://python.org
python --version # ✅ Ensure it's ≥ 3.8
# Download from https://git-scm.com
git --version
git clone https://github.com/milind899/Advanced-RAG-Chatbot.git
cd Advanced-RAG-Chatbot
python -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
pip install -r requirements.txt
streamlit
– Frontend UIlangchain
– RAG orchestrationtorch
– Model + CUDA accelerationtransformers
– Embedding generationfaiss-cpu
– Vector similarity searchPyMuPDF
– PDF parser
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull mistral
ollama serve
curl http://localhost:11434/api/tags
# Verify CUDA
nvidia-smi
# Enable GPU
unset OLLAMA_NO_CUDA
# Test with PyTorch
python -c "import torch; print(torch.cuda.is_available())"
streamlit run app.py
🔗 Open your browser at: http://localhost:8501
-
Open the app in your browser.
-
Drag-and-drop or browse to upload PDFs.
-
Adjust settings from the sidebar:
Chunk Size
(default: 500)Overlap
(default: 50)Workers
(CPU count default)Batch Size
(default: 100)
-
Click Process Documents.
- Go to the Chat with Your Documents section.
- Type a question and hit Send.
- View responses with source links.
- Review chat history at the bottom.
- Export your chats via the Export JSON button.
- View chunks/sec, total time in Performance Metrics.
- Sidebar shows GPU & system usage.
- Troubleshoot with real-time logs.
Advanced-RAG-Chatbot/
├── app.py # Streamlit UI
├── rag_backend.py # Core RAG logic
├── requirements.txt # Dependencies
├── LICENSE # MIT License
├── faiss_index/ # Generated vector store
├── embedding_cache/ # Cached embeddings
Parameter | Description | Default |
---|---|---|
chunk_size |
Text chunk size | 500 |
chunk_overlap |
Overlap between chunks | 50 |
max_workers |
Concurrent threads | CPU count |
batch_size |
Documents per embedding batch | 100 |
embedding_model |
HuggingFace model | intfloat/e5-small |
cache_embeddings |
Use embedding cache | True |
Problem | Fix |
---|---|
❌ Ollama not responding | Ensure ollama serve is running and model is pulled. |
❌ CUDA not available | Install CUDA Toolkit, check nvidia-smi , enable with unset . |
💥 Memory crash | Reduce batch_size or chunk_size , clear cache folders. |
🐢 Slow performance | Enable GPU, increase max_workers , tune chunking strategy. |
- 🔧 Use the Help & Tips section in the app.
- 🐛 File issues or suggestions on GitHub Issues
- 📄 Refer to
rag_backend.py
for logic and model details.
This project is licensed under the MIT License. See LICENSE for more information.