Static website backup

This is a simple script that scrapes the website using wget in order to maintain a static fallback. This script requires pretty permalinks unless used on a single page site.

The script runs three download steps:

Download all files on the website using wget, waiting 1 second between requests. The script ignores any file with a query parameter unless that parameter is ?ver.
Download the website's 404 page. This fails if the website has a page at 404.html for some reason.
Download any extra urls specified in extra-urls.txt.

The download is followed by three post-processing steps:

Remove all query parameters (only ?ver) from the downloaded files using .github/bin/cleanup-querystrings.py.
Use sed to replace the website's URL with the GitHub Pages url in all files.
Minify all HTML files using minify.

After these steps, the files are deployed to GitHub Pages: ftcunion.github.io.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github		.github
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extra-urls.txt		extra-urls.txt
robots.txt		robots.txt
screenshot.svg		screenshot.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Static website backup

About

Uh oh!

License

ftcunion/ftcunion.github.io

Folders and files

Latest commit

History

Repository files navigation

Static website backup

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks