Skip to content

hmurij/Web-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web Crawler REST API

Web crawler program to fetch data on HTML pages up to provided depth and up ot maximum pages. Crawling service is exposed as REST endpoints. Web UI is available to complete web crawler request form, start web crawler and get statistics as file in CSV format.

Once application is started web UI is available on: http://localhost:8080/main or http://localhost:8080/

The API caller is able to use these operations on API:

  • Start web crawler - POST - /api/webcrawler
  • Get all records JSON format - GET - /api/webcrawler
  • Get n records JSON format - GET - /api/webcrawler/{count}
  • Get all records CSV format - GET - /api/webcrawler/csv
  • Get n sorted records CSV format - GET - /api/webcrawler/csv/{count}

Link to Postman test data samples

Installation

Download code as ZIP or git pull https://github.com/hmurij/Web-Crawler.git Import existing Maven project and run com.webcrawler.WebCrawlerApplication.java or start application by executing startCrawler.bat, please note webCrawler.jar should be in the same directory as bat file.

About

Web Crawler REST API - Thymeleaf, Java Spring Boot Maven project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published