This project analyzes 98,000+ Udemy courses using the ELT stack (Elasticsearch, Logstash, Kibana), orchestrated with Docker. The goal is to gain insights into online courses by performing data wrangling, querying, and visualization.
- Elasticsearch β Data indexing & queries
- Logstash β Data extraction & transformation
- Kibana β Interactive dashboards & visualizations
- Docker β Containerized setup for easy deployment
π Other files/
βββ docker-compose.yaml # ELK Stack container setup
βββ index.json # Elasticsearch index mapping
βββ logstash/ # Logstash configuration files
βββ logstash.conf # Logstash configuration file
βββ kibanaDashboard.ndjson # Kibana dashboard import file
cd "Other files"
docker-compose up es01 kibana
curl -u elastic:admin -X PUT "localhost:9200/udemy_courses"
curl -u elastic:admin -X PUT "localhost:9200/udemy_courses/_mapping" \
-H "Content-Type: application/json" -d @index.json
docker-compose up -d logstash
cp /path/to/dataset.csv logstash/csvData/
docker-compose restart logstash
- Top 5 free Python courses based on ratings and reviews
- Course distribution by category and level
- Comparison of ratings for free vs. paid courses
- Most popular business courses
- Author: Marco Minaudo
- Course: Systems and Methods for Big and Unstructured Data (SMBUD)
- Academic Year: 2024-2025