PyHttrack is a lightweight and powerful Python tool that allows you to download entire websites directly to your local computer for offline access, archive, or content analysis. Inspired by the legendary HTTrack, PyHttrack comes with a modern approach, is easily customizable, and can be integrated in various automated workflows.
- 🌐 Download Full Website - HTML, CSS, JS, images and other media directly to local directory.
- ⚙️ Flexible Configuration - Specify crawl depth, file extensions, domain limits and more.
- 🖥️ Simple CLI Interface - Run and monitor processes with easy-to-understand commands.
- 📁 Organized Directory Structure - Keeps the original structure of the site for an identical offline experience.
- 🧩 Easy to Customize - Suitable for developers, researchers, and digital archivists.
- Save important site documentation before going offline
- Perform local SEO crawling & analysis
- Learn to build a site from real examples
- Backup personal content or public blogs
pip install -r requirements.txt
Edit the web.json file and add the url of the website you want to download, for example the following :
["https://example.com/xxx/xxx"]
or download many websites
[
"https://example.com/xxx/xxx",
"https://example.com/xxx/xxx",
"https://example.com/xxx/xxx"
]
Run the following command to start the download :
python pyhttrack.py
Click here to get the latest version of PyHttrack.
Contributions are very welcome!. Please feel free to fork this repo, create an issue, or submit a pull request for new features or performance improvements 🚀