Skip to content

07. Considerations

Eri Airlangga edited this page Mar 24, 2020 · 2 revisions

Considerations

Avoid Mass-Scraping

Your archived website gets none but bigger over time. It can get so big with millions of files. Certain aspects must therefore come into considerations.

It is always advisable to limit the downloads each session with filtering options, including, but not limited to:

  • Filtering by certain timestamps with -f or -t options
  • Filtering by certain files with -O option
  • Do not download what you don't need with -X option
  • Minimize the number of simultaneous download by using small number to the -c option

It is a good ettiquete to crawl politely.
Avoid mass-scraping by overloading them with too many requests for too many big files as this will surely hurt the server. If this occurs too often, they might take measures to block downloader tools such as this one, and in the long run, might lead to anti-scraping legal actions.

That said. So download wisely.

Windows Long Filename Limitation

Windows has maximum of 248 characters on a directory path while a URL doesn't. This can lead to error due to this limitation and your files are not downloaded. In this case you can examine the log file and download manually from the source URL provided.

Clone this wiki locally