ResearchWebGraph is an AI-powered research assistant that helps you explore and understand academic papers. It combines advanced technologies like knowledge graphs, vector search, and large language models (LLMs) to provide insights into research topics.
- Search for Papers: Retrieve research papers from arXiv based on keywords and filters.
- Upload PDFs: Upload your own research papers or documents for analysis.
- Build Knowledge Graphs: Extract entities and relationships from papers and visualize them as interactive graphs.
- AI-Powered Query Assistant: Ask questions about selected papers and get detailed answers with citations.
- FastAPI: High-performance API framework.
- Qdrant: Vector database for semantic search.
- SentenceTransformers: Embedding model for generating vector representations of text.
- PyPDF2 & PDFMiner: Tools for extracting text and metadata from PDFs.
- spaCy & NLTK: NLP libraries for entity extraction and text processing.
- Streamlit: Interactive web interface for users.
- streamlit-option-menu: Navigation menu for multi-page apps.
- Groq API: Access to high-performance large language models like
llama-3.1-8b-instant
andllama-3.3-70b-versatile
.
Main interface of ResearchWebGraph
Search for academic papers with filters for date and categories
Interactive knowledge graph showing relationships between research concepts
Ask questions and get comprehensive answers about your research papers
Upload your own research papers or documents for analysis
Click to watch a demonstration of ResearchWebGraph in action
- Python 3.9 or higher
- Docker (for running Qdrant locally)
- Node.js (optional, if building additional frontend components)
- Clone the repository:
git clone https://github.com/tirth8205/ResearchWebGraph.git
cd ResearchWebGraph/backend
- Create a virtual environment:
python -m venv researchwebgraph_env
source researchwebgraph_env/bin/activate # On Windows, use researchwebgraph_env\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Start Qdrant (if not already running):
docker run -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant
- Set up environment variables in
backend/.env
:
GROQ_API_KEY=your-groq-api-key
QDRANT_URL=http://localhost:6333
DEFAULT_LLM_MODEL=llama-3.1-8b-instant
- Run the backend server:
uvicorn app.main:app --reload --port 8000
- Navigate to the frontend directory:
cd ../frontend
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables in
frontend/.env
:
BACKEND_URL=http://localhost:8000
- Run the frontend application:
streamlit run app.py
- Go to the "Research Papers" tab.
- Search for papers on arXiv using keywords or upload your own PDF documents.
- Select papers from the search results or uploaded PDFs.
- Go to the "Knowledge Graph" tab and click "Build Knowledge Graph."
- Explore the interactive graph to discover entities and relationships.
- Go to the "Query Assistant" tab.
- Ask specific questions about the selected papers (e.g., "What are the main findings?").
- View detailed AI-generated answers with citations to source papers.
- "What are the main findings across these papers?"
- "Explain the methodology used in paper X."
- "How do these papers relate to each other?"
- "What are the limitations mentioned in these studies?"
ResearchWebGraph/
├── backend/
│ ├── app/
│ │ ├── routers/ # API routes (papers, knowledge graph, query)
│ │ ├── services/ # Core logic (fetching, processing, graph building)
│ │ ├── utils/ # Utility functions (PDF processing, NLP)
│ │ ├── models/ # Pydantic schemas for request/response validation
│ │ └── main.py # FastAPI entry point
│ ├── requirements.txt # Backend dependencies
│ └── .env # Backend environment variables
├── frontend/
│ ├── app.py # Streamlit entry point for frontend UI
│ ├── pages/ # Multi-page Streamlit app (Papers, Graphs, Assistant)
│ ├── components/ # Reusable UI components (API client, sidebar)
│ ├── requirements.txt # Frontend dependencies
│ └── .env # Frontend environment variables
└── README.md # Project documentation
Create a docker-compose.yml
file:
version: '3'
services:
qdrant:
image: qdrant/qdrant:v1.x.x # Replace with latest version tag if needed
ports:
- "6333:6333"
volumes:
- ./qdrant_storage:/qdrant/storage
backend:
build:
context: ./backend/
ports:
- "8000:8000"
env_file:
- ./backend/.env
Run the deployment:
docker-compose up --build -d
You can deploy the Streamlit frontend on platforms like Streamlit Cloud or Heroku by configuring your frontend/.env
file with the production backend URL.
Contributions are welcome! Here's how you can contribute:
- Bug Reports: Open an issue describing the bug and steps to reproduce it
- Feature Requests: Open an issue describing the feature and its potential implementation
- Code Contributions: Fork the repository, make your changes, and submit a pull request
- Follow the existing code style and patterns
- Write tests for new features
- Update documentation for any changes
We value your feedback! Here's how to reach us:
- Issues: Use GitHub Issues for bug reports and feature requests
- Discussions: Join our GitHub Discussions for questions and ideas
- Contact:
- Name: Tirth Kanani
- Email: tirthkanani18@gmail.com
- LinkedIn: https://www.linkedin.com/in/tirthkanani/
Your feedback helps make ResearchWebGraph better for everyone!
This project is licensed under the MIT License.