Scrapi is a homemade web scraper for C&W Berry LTD - Builders' Merchant.
- Websites: Scrapes from ebay, amazon, and manomano.
- Cheapest 3 Google Results: Scrapes the cheapest 3 results from Google Search.
- Determined: Scrapi tries very hard to find a price on the website using structured data, common regex's and on failure of these, it uses OpenAI's GPT-4 model to find a price.
- Database: Fast SQLite database for storing prices and products, keeping old prices for comparison.
- Shell: A web shell for running commands on the server.
- Browser: A page that finds the price and takes a screenshot of any URL it is given.
- SvelteKit - Web Framework
- TypeScript - Programming Language
- Puppeteer - Web Scraping Library
- Vite - Build Tool and Dev Server
- Tailwind CSS - CSS Framework
- Bun - JavaScript Runtime and Package Manager
- bun:sqlite - SQLite Database Library
- GitHub Actions & Dockerfile - Auto build and deploy upon push to main branch
- OpenAI API - gpt-4o-mini as a fallback if standard methods cannot find a price
- ebay-api - Node eBay API wrapper
- Install
git
,unzip
,bun
.
sudo apt update
sudo apt install -y git unzip
curl -fsSL https://bun.sh/install | bash # for macOS, Linux, and WSL
source ~/.bashrc
- Download ExpressVPN and Google Chrome from the web and once downloaded (select the first options - for ubuntu and .deb), install them using the following commands.
cd Downloads
sudo dpkg -i expressvpn*.deb
sudo dpkg -i google-chrome*.deb
- Install the GitHub Actions Runner script. <- Click this link. Then
New Runner
>Self-Hosted
>Linux x64
, and follow those instructions. I'll give you them here, but you will need to get the private token from the GitHub page.
GitHub Action Download
# Create a folder
cd ~ && mkdir actions-runner && cd actions-runner
# Download the latest runner package
curl -o actions-runner-linux-x64-2.319.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.319.1/actions-runner-linux-x64-2.319.1.tar.gz
# Optional: Validate the hash
echo "3f6efb7488a183e291fc2c62876e14c9ee732864173734facc85a1bfb1744464 actions-runner-linux-x64-2.319.1.tar.gz" | shasum -a 256 -c
# Extract the installer
tar xzf ./actions-runner-linux-x64-2.319.1.tar.gz
GitHub Action Configuration
# Create the runner and start the configuration experience
./config.sh --url https://github.com/candwberry --token {FIND THE PRIVATE TOKEN ON THAT PAGE}
# Last step, run it!
./run.sh
- Open a new terminal window at ~, and clone the git repository, change directory into the repository, and create the database files.
cd ~
git clone https://github.com/candwberry/scrapi.git
cd scrapi
touch mydb.sqlite
touch mydb.sqlite-wal
touch mydb.sqlite-shm
- Finally, run the auto-updater.
bun runner.ts
and that's it!
- Install Ubuntu from the Windows Store
- Launch it, and call the user
scrapi
. (You can call it anything but I will have to change settings if you do) - Create our database files.
touch mydb.sqlite
touch mydb.sqlite-wal
touch mydb.sqlite-shm
- Install Docker on the Ubuntu terminal, using these commands:
Docker Download
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin
docker-compose-plugin
# Verify Docker works (You should see 'Hello from Docker!')
sudo docker run hello-world
Docker Permissions
sudo usermod -aG docker $USER
newgrp docker
- Now install the GitHub Actions Runner script. <- Click this link. Then
New Runner
>Self-Hosted
>Linux x64
, and follow those instructions. I'll give you them here, but you will need to get the private token from the GitHub page.
GitHub Action Download
# Create a folder
mkdir actions-runner && cd actions-runner
# Download the latest runner package
curl -o actions-runner-linux-x64-2.319.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.319.1/actions-runner-linux-x64-2.319.1.tar.gz
# Optional: Validate the hash
echo "3f6efb7488a183e291fc2c62876e14c9ee732864173734facc85a1bfb1744464 actions-runner-linux-x64-2.319.1.tar.gz" | shasum -a 256 -c
# Extract the installer
tar xzf ./actions-runner-linux-x64-2.319.1.tar.gz
GitHub Action Configuration
# Create the runner and start the configuration experience
./config.sh --url https://github.com/candwberry --token {FIND THE PRIVATE TOKEN ON THAT PAGE}
# Last step, run it!
./run.sh
That is the setup completed. Now, whenever someone pushes to main branch, the application will automatically be built and deployed. To set it up for the first time now our action runner is installed, go to the Scrapi repository. You should see this:
Click that tick
, click See Details
, then Re-run all jobs
to set-off the initial build. Follow these steps also, should the server ever go down or the application crash.
- Oh and don't forget to install and run Express VPN on Windows, this way you can use the GUI and things will be easier. :)
- Ensure you have bun and git installed
- Clone the repository:
git clone https://github.com/candwberry/scrapi
- Change directory:
cd scrapi
- Install dependencies:
bun install
- Create the
mydb.sqlite
file, otherwise the database will not persist between runs - Run the development server:
bun --bun run dev
. The--bun
flag makes sure thebun
runtime is used, and sobun:sqlite
exists - Go to localhost to view the project