The Job Scraper Project is designed to create a comprehensive system for scraping job postings from various websites and serving the collected data through an API built with SpringBoot. The project involves multiple contributors who will each be responsible for implementing a script to scrape different job posting websites.
-
Adding Websites :
- Users can suggest new job posting websites to be added to the scraping pipeline.
- These suggestions are reviewed and added to the list of target websites.
-
Contributor Workflow :
- Each contributor is assigned a website from the list.
- They inspect the website to understand its structure and identify the necessary elements for scraping.
- They then implement the scraping logic using appropriate tools and libraries.
-
Scraping and Data Storage :
- The scraping service runs at scheduled intervals to collect job postings from the target websites.
- The collected data is stored in a database for easy access and management.
-
API Access :
- The stored job posting data is made available through a RESTful API built with Spring Boot.
- Users can query the API to retrieve job postings based on various criteria.
To get started with the CD Job Scraper Project, follow these steps:
-
Clone the Repository :
git clone https://github.com/ASU-CodeDevils/scraper.codedevils.org.git # Or if you have ssh set up git clone git@github.com:ASU-CodeDevils/scraper.codedevils.org.git # Navigate to the cloned directory cd scraper.codedevils.org
-
Set Up the Environment :
-
Make sure you have Docker Desktop installed and running.
-
Ensure you have Java (Version 21) and Gradle (Version 8.5) installed.
- For UNIX systems (i.e. macOS, Linux, or Windows WSL), use SDKMAN to install the correct version of Java and Gradle.
- Not recommended: For Windows systems, install Java 21 and Gradle 8.5 manually.
-
Set up your local environment variables: create a file called
.env
in the root directory and add the following lines:MYSQL_ROOT_PASSWORD=<create your own password> MYSQL_DATABASE=<name your database> MYSQL_USER=<create your own username> MYSQL_PASSWORD=<create a differnent password than ROOT_PASSWORD> MYSQL_URL=jdbc:mysql://localhost:3306/<MYSQL_DATABASE>
-
-
Run the Application :
Using command line:
./gradlew bootRun
If you use IntelliJ, use the
Run
menu to run the application. If you are using Visual Studio Code, install the extensions Gradle for Java and Spring Boot Extension Pack. -
Contribute :
- Check the issues or the GitHub projects for available tasks.
- Find a task that interests you and assign yourself to it. (You should only have one task assigned to you at a time.)
- Update the
Status
column toIn Progress
. - Fill in the
Start Date
column with the date you assigned yourself to the task. (Note: You will have 1-week time to finish the task before it is up for grabs for anyone else.) - Follow the guidelines to implement the scraping logic and submit your code via a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.