scrapi

Scrapi is a homemade web scraper for C&W Berry LTD - Builders' Merchant.

Features

Websites: Scrapes from ebay, amazon, and manomano.
Cheapest 3 Google Results: Scrapes the cheapest 3 results from Google Search.
Determined: Scrapi tries very hard to find a price on the website using structured data, common regex's and on failure of these, it uses OpenAI's GPT-4 model to find a price.
Database: Fast SQLite database for storing prices and products, keeping old prices for comparison.
Shell: A web shell for running commands on the server.
Browser: A page that finds the price and takes a screenshot of any URL it is given.

Technical Details

Built With

SvelteKit - Web Framework
TypeScript - Programming Language
Puppeteer - Web Scraping Library
Vite - Build Tool and Dev Server
Tailwind CSS - CSS Framework
Bun - JavaScript Runtime and Package Manager
bun:sqlite - SQLite Database Library
GitHub Actions & Dockerfile - Auto build and deploy upon push to main branch
OpenAI API - gpt-4o-mini as a fallback if standard methods cannot find a price
ebay-api - Node eBay API wrapper

Installation

For Production on Ubuntu (Linux)

Install git, unzip, bun.

sudo apt update
sudo apt install -y git unzip
curl -fsSL https://bun.sh/install | bash # for macOS, Linux, and WSL
source ~/.bashrc

Download ExpressVPN and Google Chrome from the web and once downloaded (select the first options - for ubuntu and .deb), install them using the following commands.

cd Downloads
sudo dpkg -i expressvpn*.deb
sudo dpkg -i google-chrome*.deb

Install the GitHub Actions Runner script. <- Click this link. Then New Runner > Self-Hosted > Linux x64, and follow those instructions. I'll give you them here, but you will need to get the private token from the GitHub page.

GitHub Action Download

# Create a folder
cd ~ && mkdir actions-runner && cd actions-runner

# Download the latest runner package
curl -o actions-runner-linux-x64-2.319.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.319.1/actions-runner-linux-x64-2.319.1.tar.gz

# Optional: Validate the hash
echo "3f6efb7488a183e291fc2c62876e14c9ee732864173734facc85a1bfb1744464  actions-runner-linux-x64-2.319.1.tar.gz" | shasum -a 256 -c

# Extract the installer
tar xzf ./actions-runner-linux-x64-2.319.1.tar.gz

GitHub Action Configuration

# Create the runner and start the configuration experience
./config.sh --url https://github.com/candwberry --token {FIND THE PRIVATE TOKEN ON THAT PAGE}

# Last step, run it!
./run.sh

Open a new terminal window at ~, and clone the git repository, change directory into the repository, and create the database files.

cd ~
git clone https://github.com/candwberry/scrapi.git
cd scrapi
touch mydb.sqlite
touch mydb.sqlite-wal
touch mydb.sqlite-shm

Finally, run the auto-updater.

bun runner.ts

and that's it!

For Production (for C&W Berry Ltd IT staff on Windows)

Install Ubuntu from the Windows Store
Launch it, and call the user scrapi. (You can call it anything but I will have to change settings if you do)
Create our database files.

touch mydb.sqlite
touch mydb.sqlite-wal
touch mydb.sqlite-shm

Install Docker on the Ubuntu terminal, using these commands:

Docker Download

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin 
docker-compose-plugin

# Verify Docker works (You should see 'Hello from Docker!')
sudo docker run hello-world

Docker Permissions

sudo usermod -aG docker $USER
newgrp docker

Now install the GitHub Actions Runner script. <- Click this link. Then New Runner > Self-Hosted > Linux x64, and follow those instructions. I'll give you them here, but you will need to get the private token from the GitHub page.

GitHub Action Download

# Create a folder
mkdir actions-runner && cd actions-runner

# Download the latest runner package
curl -o actions-runner-linux-x64-2.319.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.319.1/actions-runner-linux-x64-2.319.1.tar.gz

# Optional: Validate the hash
echo "3f6efb7488a183e291fc2c62876e14c9ee732864173734facc85a1bfb1744464  actions-runner-linux-x64-2.319.1.tar.gz" | shasum -a 256 -c

# Extract the installer
tar xzf ./actions-runner-linux-x64-2.319.1.tar.gz

GitHub Action Configuration

# Create the runner and start the configuration experience
./config.sh --url https://github.com/candwberry --token {FIND THE PRIVATE TOKEN ON THAT PAGE}

# Last step, run it!
./run.sh

That is the setup completed. Now, whenever someone pushes to main branch, the application will automatically be built and deployed. To set it up for the first time now our action runner is installed, go to the Scrapi repository. You should see this:

Click that tick, click See Details, then Re-run all jobs to set-off the initial build. Follow these steps also, should the server ever go down or the application crash.

Oh and don't forget to install and run Express VPN on Windows, this way you can use the GUI and things will be easier. :)

For Development

Ensure you have bun and git installed
Clone the repository: git clone https://github.com/candwberry/scrapi
Change directory: cd scrapi
Install dependencies: bun install
Create the mydb.sqlite file, otherwise the database will not persist between runs
Run the development server: bun --bun run dev. The --bun flag makes sure the bun runtime is used, and so bun:sqlite exists
Go to localhost to view the project

Name		Name	Last commit message	Last commit date
Latest commit History 319 Commits
.github/workflows		.github/workflows
src		src
static		static
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
backup.sh		backup.sh
index.ts		index.ts
package.json		package.json
postcss.config.js		postcss.config.js
runner.ts		runner.ts
svelte.config.js		svelte.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrapi

Features

Technical Details

Built With

Installation

For Production on Ubuntu (Linux)

For Production (for C&W Berry Ltd IT staff on Windows)

For Development

About

Releases

Packages

Contributors 2

Languages

License

candwberry/scrapi

Folders and files

Latest commit

History

Repository files navigation

scrapi

Features

Technical Details

Built With

Installation

For Production on Ubuntu (Linux)

For Production (for C&W Berry Ltd IT staff on Windows)

For Development

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages