This project is an automated system designed to extract essential information from websites, format the data, and query a large language model (LLM) to provide concise and fact-based responses. The main functionality includes:
- Scraping structured data from websites.
- Formatting the extracted data for LLM consumption.
- Sending queries to the LLM and handling responses.
- Web Scraping Module: Extracts metadata, contact information, and social media links.
- Data Formatting: Processes and formats extracted data into a structured string for LLM querying.
- LLM Integration: Sends queries to the LLM using a provided API key and retrieves responses.
- Error Handling and Logging: Includes comprehensive logging and retry logic for robustness.
- Python
- BeautifulSoup and requests for web scraping.
- Groq API for LLM interaction.
- Logging for debugging and tracking execution.
- Bing Search API for searching of urls from web
- Clone the repository:
git clone https://github.com/arpitkumar2004/AI_Information_Retrieval_Agent.git
- Set up a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Run the main script:
python main.py
- Provide the required input (e.g., URL and query). For testing purpose i have already mentioned in the file
- View the output, which displays the formatted data and LLM response.
file_path = r"<your_file_path>"
query = "What is the contact email for this company?"
response = query_llm_from_text(scrape_essential_info(url), query, api_key)
print(response)
- API Key: Ensure your Groq API and Bing Search API key is set in the environment or passed securely to the script. (for eg. add config.py in the backend/app/config.py )
- Retry Mechanism: Configurable number of retries and delay settings for API calls.
# config.py
groq_api_key = "<your-groq-api-key>" # Replace with your actual API key
Bing_api_key = "<you-bing-api-key>" # Replace with your OpenAI API Key
- Handles common errors, including missing keys in the extracted data.
- Logs errors with detailed information for troubleshooting.
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Commit your changes (
git commit -m 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Create a pull request.
This project is licensed under the MIT License. See the LICENSE
file for more details.
For any questions, please reach out to [kumararpit17773@gmail.com].