This is a simple script that scrapes the website using wget
in order to maintain a static fallback. This script requires pretty permalinks unless used on a single page site.
The script runs three download steps:
- Download all files on the website using
wget
, waiting 1 second between requests. The script ignores any file with a query parameter unless that parameter is?ver
. - Download the website's 404 page. This fails if the website has a page at
404.html
for some reason. - Download any extra urls specified in
extra-urls.txt
.
The download is followed by three post-processing steps:
- Remove all query parameters (only
?ver
) from the downloaded files using.github/bin/cleanup-querystrings.py
. - Use
sed
to replace the website's URL with the GitHub Pages url in all files. - Minify all HTML files using minify.
After these steps, the files are deployed to GitHub Pages: ftcunion.github.io.