Skip to content

This project asynchronously scrapes web content, generates semantic text chunks using sentence embeddings, and stores them in a Milvus vector database for efficient similarity search. Built with Python, Langchain, SentenceTransformers, and Milvus for scalable vector-based retrieval.

Notifications You must be signed in to change notification settings

itsSwapnil/Milvus-vector-database-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Milvus-vector-database-project

This project asynchronously scrapes web content, generates semantic text chunks using sentence embeddings, and stores them in a Milvus vector database for efficient similarity search. Built with Python, Langchain, SentenceTransformers, and Milvus for scalable vector-based retrieval.

Semantic Web Scraper with Milvus Vector Search

This project scrapes content from multiple websites asynchronously, tokenizes and embeds the content into semantic chunks using Sentence Transformers, and stores them in a Milvus vector database for efficient similarity search and retrieval.

πŸ“¦ Dependencies

  1. Python 3.9
  2. aiohttp
  3. nltk
  4. pandas
  5. sentence-transformers
  6. pymilvus
  7. langchain
  8. scikit-learn
  9. numpy

πŸš€ Features

  • Asynchronous web scraping with aiohttp and langchain
  • Semantic chunking using NLTK sentence tokenization
  • Embedding with sentence-transformers/all-MiniLM-L6-v2
  • Vector similarity search using Milvus
  • Dockerized setup with Milvus, MinIO, Etcd, and Python environment

🐳 Docker Setup

Clone the Repository

1. git clone https://github.com/yourusername/semantic-web-milvus.git
cd semantic-web-milvus

2. Start Docker Services
docker-compose up --build -d
This will spin up:

Milvus vector database
Etcd (metadata service)
MinIO (object storage)

Python container (milvus-python) with all dependencies pre-installed

3. Access Python Container
docker exec -it milvus-python bash

4. Run your main script:
python your_script.py

πŸ“‚ Project Structure

β”œβ”€β”€ docker-compose.yml

β”œβ”€β”€ Dockerfile.python

β”œβ”€β”€ scripts/

β”‚ └── your_script.py

β”œβ”€β”€ volumes/

β”‚ β”œβ”€β”€ etcd/

β”‚ β”œβ”€β”€ milvus/

β”‚ └── minio/

✨ Use Cases

Building search engines over scraped web content

Knowledge base construction with semantic search

Content recommendation systems


πŸ™‹ Author

LinkedIn: http://www.linkedin.com/in/SwapnilTaware

GitHub: https://github.com/itsSwapnil


πŸ“œ License

This project is licensed under the MIT License.

About

This project asynchronously scrapes web content, generates semantic text chunks using sentence embeddings, and stores them in a Milvus vector database for efficient similarity search. Built with Python, Langchain, SentenceTransformers, and Milvus for scalable vector-based retrieval.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages