Document Assistant

A powerful document analysis tool that uses Retrieval Augmented Generation (RAG) to provide intelligent answers to questions about your documents.

Author

Chaitanya Vankadaru
AI/ML Engineer | Python Developer | Data Scientist
LinkedIn Profile

Features

📄 PDF Document Processing: Advanced PDF parsing and text extraction
🔍 Smart Text Chunking: Intelligent document segmentation with customizable settings
🧠 Vector Embeddings: State-of-the-art embeddings using Sentence Transformers
💾 FAISS Vector Store: Fast and efficient similarity search
🤖 RAG Architecture: Enhanced question answering using document context
🎨 Modern UI: Clean, responsive interface with Streamlit
📊 System Statistics: Real-time performance metrics
🔄 Conversation History: Track and review Q&A interactions
⚙️ Customizable Settings: Adjust chunk size and overlap

Technology Stack

Python 3.12
Streamlit (>=1.37.0)
LangChain (>=0.2.5)
FAISS-CPU (>=1.7.4)
Sentence Transformers (>=2.2.2)
OpenAI GPT (>=1.6.1)
PyPDF (>=3.17.0)

Prerequisites

Python 3.12 or higher
OpenAI API key
Git (for version control)
Virtual environment (recommended)

Installation

Clone the repository:

git clone https://github.com/EarthlyAlien/Document-Assistant.git
cd Document-Assistant

Create and activate a virtual environment:

# On Windows
python -m venv venv
.\venv\Scripts\activate

# On macOS/Linux
python -m venv venv
source venv/bin/activate

Install dependencies:

# For production
pip install -r requirements.txt

# For development
pip install -r requirements-dev.txt

Set up environment variables: Create a .env file in the project root:

OPENAI_API_KEY=your_api_key_here

Run the application:

streamlit run app.py

Usage

Document Upload
- Use the sidebar to upload PDF documents
- View uploaded document list
- Clear documents when needed
Configuration
- Adjust chunk size (default: 1000)
- Set chunk overlap (default: 200)
- Configure these based on document length and complexity
Processing
- Click "Process Document" to extract text and generate embeddings
- Monitor processing status in real-time
Question Answering
- Enter questions about your documents
- View AI-generated responses with source context
- Track conversation history

Architecture

The Document Assistant uses a sophisticated RAG (Retrieval Augmented Generation) architecture:

Document Processing
- PDF parsing and text extraction
- Intelligent text chunking with overlap
- Clean text preprocessing
Vector Store
- Chunk embedding generation using Sentence Transformers
- FAISS vector index for efficient similarity search
- Persistent storage of embeddings
Question Answering
- Query embedding and semantic search
- Context retrieval from vector store
- LLM-powered answer generation with context

Development

For development work:

Install development dependencies:

pip install -r requirements-dev.txt

Development tools available:
- pytest (>=7.4.4): Testing framework
- pytest-cov (>=4.1.0): Code coverage
- flake8 (>=7.0.0): Code linting
- mypy (>=1.8.0): Static type checking
- black (>=24.2.0): Code formatting
Run tests:

# Run all tests
pytest

# Run with coverage report
pytest --cov=.

# Run with verbose output
pytest -v

Code formatting:

# Format code
black .

# Check code style
flake8 .

# Type checking
mypy .

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Security

Regular dependency updates
Security vulnerability monitoring
Safe API key handling
Input validation and sanitization

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Author: Chaitanya Vankadaru
LinkedIn: Profile
GitHub: EarthlyAlien

Acknowledgments

OpenAI for GPT API
Streamlit for the UI framework
FAISS for vector similarity search
Sentence Transformers for embeddings

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
conftest.py		conftest.py
create_sample.py		create_sample.py
document_processor.py		document_processor.py
rag.py		rag.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
vector_store.py		vector_store.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Assistant

Author

Features

Technology Stack

Prerequisites

Installation

Usage

Architecture

Development

Contributing

Security

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

EarthlyAlien/Document-Assistant

Folders and files

Latest commit

History

Repository files navigation

Document Assistant

Author

Features

Technology Stack

Prerequisites

Installation

Usage

Architecture

Development

Contributing

Security

License

Contact

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages